This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Kolmogorov–Arnold networks in molecular dynamics

Yuki Nagai Information Technology Center, The University of Tokyo, 6–2–3 Kashiwanoha, Kashiwa, Chiba 277–0882, Japan [email protected]    Masahiko Okumura Center for Computational Science and e-Systems, Japan Atomic Energy Agency, Kashiwa, Chiba 277–0871, Japan
Abstract

We explore the integration of Kolmogorov–Arnold Networks (KANs) into molecular dynamics (MD) simulations to improve interatomic potentials. We propose that widely used potentials, such as the Lennard–Jones (LJ) potential, the embedded atom model (EAM), and artificial neural network (ANN) potentials, can be interpreted within the KAN framework. Specifically, we demonstrate that the descriptors for ANN potentials, typically constructed using polynomials, can be redefined using KAN’s non-linear functions. By employing linear or cubic spline interpolations for these KAN functions, we show that the computational cost of evaluating ANN potentials and their derivatives is reduced.

preprint: APS/123-QED

I Introduction

Molecular dynamics (MD) simulations have revolutionized the field of materials science, providing detailed atomistic insights into the structure, properties, and behavior of materials under various conditions [1, 2, 3, 4, 5]. By solving Newton’s equations of motion for a system of interacting atoms, MD simulations can model processes such as diffusion, phase transitions, mechanical deformation, and chemical reactions. Despite their widespread utility, traditional MD simulations are often limited by the accuracy and efficiency of the force fields used to approximate interatomic forces.

Ab initio molecular dynamics (AIMD), particularly those based on density functional theory (DFT), offer enhanced accuracy by explicitly accounting for the electronic structure of materials [6, 7]. However, the computational cost of AIMD is prohibitive for simulating large systems or long-time scales, restricting its application to relatively small and short simulations. This limitation underscores the need for more efficient computational methods that do not compromise on accuracy.

Machine learning (ML) has emerged as a transformative approach to address these challenges in MD simulations. By training on energies obtained from DFT calculations, ML models, particularly artificial neural networks (ANNs), can accurately predict potential energy surfaces and interatomic forces [8]. Integrating ML with MD, known as machine learning molecular dynamics (MLMD), offers a promising pathway to enhance both the accuracy and efficiency of simulations [9, 10, 11, 12, 13, 14].

Developing novel and effective ML methods tailored specifically for MD simulations is crucial to further unlocking the potential of MLMD [14, 9, 12, 13, 15, 16, 17]. For MLMD, multi-layer perceptrons (MLPs) are mainly used for ANN. Recently, Kolmogorov-Arnold networks (KANs), an alternative to MLPs, have been proposed [18]. They are extensions of Kolmogorov–Arnold representation theorem [19, 20], which states that any continuous function can be represented as a superposition of continuous functions of a single variable. This theoretical framework provides a powerful tool for decomposing complex functions into simpler, more manageable component functions.

In this paper, we explore the relations between KANs and potential energy in molecular dynamics. From the KAN perspective, the Lennard–Jones (LJ) potential, embedded atom mode (EAM), and ANN potential can all be viewed as forms of KANs. Utilizing linear or cubic spline interpolations, we can reduce the computational cost of the ANN potential.

This paper is organized as follows. In Sec. II, we introduce KANs. In Sec.  III, we discuss the relation between the KANs and potentials used in molecular dynamics. The KAN descriptor is introduced in Sec.  IV. The numerical results are shown in Sec. V. We introduce the linear and cubic spline interpolations to approximate the KAN descriptor for the speedup of simulations. In Sec. VI, the conclusion is given.

II Kolmogorov–Arnold Networks

II.1 Kolmogorov–Arnold representation theorem

Recently, KANs have been proposed inspired by the Kolmogorov–Arnold representation theorem [18]. The Kolmogorov–Arnold representation theorem shows that a multivariate function f(𝕩)=f(x1,x2,,xn)f(\pmb{x})=f(x_{1},x_{2},\ldots,x_{n}) can be represented by a superposition of univariate functions [19, 20]. For a smooth f:[0,1]nf:[0,1]^{n}\rightarrow\mathbb{R}, a function f(𝕩)f(\pmb{x}) is represented as

f(𝕩)=q=12n+1Φq(p=1nϕq,p(xp)),\displaystyle f(\pmb{x})=\sum_{q=1}^{2n+1}\Phi_{q}\!\left(\sum_{p=1}^{n}\phi_{q,p}(x_{p})\right), (1)

where ϕq,p:[0,1]\phi_{q,p}:[0,1]\rightarrow\mathbb{R}, and Φq:\Phi_{q}:\mathbb{R}\rightarrow\mathbb{R} (See, Fig. 1).

Refer to caption
Figure 1: The schematic of Kolmogorov–Arnoold representation theorem.

For example, one can easily confirm that the multiplication x1x2x_{1}x_{2} is represented by a superposition of univariate functions like Eq. (1),

x1x2\displaystyle x_{1}x_{2} =exp(i=12logxi).\displaystyle=\exp\!\left(\sum_{i=1}^{2}\log x_{i}\right). (2)

II.2 Kolmogorov–Arnold Networks

Liu et al. have extended the idea of the Kolmogorov–Arnold representation theorem [18]. By regarding Eq. (1) as a two-layer network with one hidden layer with 2n+12n+1 neurons, the more general form of the networks can be given by

f(𝒙)\displaystyle f({\bm{x}}) =iL=1nLΦiL(xiL(L)),\displaystyle=\sum_{i_{L}=1}^{n_{L}}\Phi_{i_{L}}\!\left(x^{(L)}_{i_{L}}\right), (3)
xiL(L)\displaystyle x^{(L)}_{i_{L}} =iL1=1nL1ϕL1,iL,iL1(xiL1(L1)),\displaystyle=\sum_{i_{L-1}=1}^{n_{L-1}}\phi_{L-1,i_{L},i_{L-1}}\!\left(x_{i_{L-1}}^{(L-1)}\right), (4)
xiL1(L1)\displaystyle x^{(L-1)}_{i_{L-1}} =iL2=1nL2ϕL2,iL1,iL2(xiL2(L2)),\displaystyle=\sum_{i_{L-2}=1}^{n_{L-2}}\phi_{L-2,i_{L-1},i_{L-2}}\!\left(x_{i_{L-2}}^{(L-2)}\right), (5)
\displaystyle\vdots (6)
xi2(2)\displaystyle x^{(2)}_{i_{2}} =i1=1n1ϕ1,i2,i1(xi1(1)),\displaystyle=\sum_{i_{1}=1}^{n_{1}}\phi_{1,i_{2},i_{1}}\!\left(x_{i_{1}}^{(1)}\right), (7)
xi1(1)\displaystyle x^{(1)}_{i_{1}} =i0=1n0ϕ0,i1,i0(xi0),\displaystyle=\sum_{i_{0}=1}^{n_{0}}\phi_{0,i_{1},i_{0}}(x_{i_{0}})\,, (8)

which is called KAN. The schematic of KAN is shown in Fig. 2.

Refer to caption
Figure 2: The schematic of Kolmogorov–Arnold networks.

The multivariate function f(𝒙)f({\bm{x}}) is represented by a superposition of non-linear “activation” functions ϕl,qp(x)\phi_{l,qp}(x). To construct function f(𝒙)f({\bm{x}}), a shape of the non-linear functions ϕl,qp(x)\phi_{l,qp}(x) is optimized. For example, in the implementation by Liu et al., each KAN layer is composed of the sum of the spline function and the SiLU activation function. There are various kinds of basis functions, such as Legendre polynomials, Chebyshev polynomials, and Gaussian radial distribution functions [21, 22, 23, 24].

III Potential energy in molecular dynamics

III.1 Various kinds of potential energies

In the field of molecular dynamics, accurately estimating the forces on each atom is crucial. Since the force on the ii-th atom at i\pmb{R}_{i} is derived from the derivative of the total potential energy EE, i.e.,

𝔽i=Ei.\pmb{F}_{i}=-\frac{\partial E}{\partial\pmb{R}_{i}}\,. (9)

An accurate and fast estimation of the potential energy is one of the most important challenges.

Various kinds of potentials have been proposed for molecular dynamics. For example, the LJ potential is a simplified model that describes the essential features of interactions between simple atoms and molecules [25]. The EAM is an approximation describing the interatomic potential and is particularly appropriate for metallic systems [26].

Calculations based on DFT, known as ab initio calculations, are among the most accurate methods for evaluating potential energy. AIMD has many applications in material design. However, reducing the computational effort required for AIMD remains a key issue for its broader application to phenomena on large lengths and time scales.

The use of ANNs, which imitate DFT energies through machine learning, is seen as a promising solution to this issue [8]. Research in machine-learning molecular dynamics has grown rapidly over the last decade [14, 9, 12, 13, 15, 16, 17, 11]. Various network structures have been proposed, including graph neural networks [27, 28, 12, 13, 9, 10, 11, 12, 13, 14] and transformer architectures [15].

III.2 Relation between Kolmogorov–Arnold networks and potential energy

III.2.1 Lennard–Jones potential

Let us discuss the relation between KAN and potential energy in molecular dynamics. Using the LJ potential, the total potential energy for NN atoms is expressed as

ELJ()\displaystyle E^{\mathrm{LJ}}(\pmb{R}) =i=1NEiLJ(),\displaystyle=\sum_{i=1}^{N}E_{i}^{\mathrm{LJ}}(\pmb{R})\,, (10)
EiLJ()\displaystyle E_{i}^{\mathrm{LJ}}(\pmb{R}) =12j=1,jiNVLJ(Rij),\displaystyle=\frac{1}{2}\sum_{j=1,j\neq i}^{N}V_{\mathrm{LJ}}(R_{ij})\,, (11)

where, with empirically determined coefficients AmA_{m} and BnB_{n},

VLJ(R)\displaystyle V_{\mathrm{LJ}}(R) =AmRmBnRn,\displaystyle=\frac{A_{m}}{R^{m}}-\frac{B_{n}}{R^{n}}\,, (12)
Rij\displaystyle R_{ij} =|ij|,\displaystyle=|\pmb{R}_{ij}|\,, (13)
ij\displaystyle\pmb{R}_{ij} =ji.\displaystyle=\pmb{R}_{j}-\pmb{R}_{i}\,. (14)

The LJ potential can be represented by a type of KAN, as shown in Fig. 3.

Refer to caption
Figure 3: The schematic of Lennard–Jones potential.

As shown in the KAN shown in Eq. (3), a non-linear function depends on indices of both input and output layers iL1i_{L-1} and iLi_{L}. In the LJ potential model, the elements of the initial input are distances between two atoms, which should have a permutation symmetry. Therefore, all non-linear functions are VLJ(R)V_{\mathrm{LJ}}(R), similar to weight sharing in MLP networks. We call this feature “function sharing.”

III.2.2 Embedded atom model potential

The total potential energy in the EAM is divided into two contributions, namely a pairwise part and a local density part:

EEAM()\displaystyle E^{\mathrm{EAM}}(\pmb{R}) =i=1NEiEAM(𝑹),\displaystyle=\sum_{i=1}^{N}E_{i}^{\mathrm{EAM}}({\bm{R}})\,, (15)
EiEAM(𝑹)\displaystyle E_{i}^{\mathrm{EAM}}({\bm{R}}) =Fti(ni())+12j=1,jiNϕtitj(Rij),\displaystyle=F_{t_{i}}\!\left(n_{i}(\pmb{R})\right)+\frac{1}{2}\sum_{j=1,j\neq i}^{N}\phi_{t_{i}t_{j}}(R_{ij})\,, (16)
ni()\displaystyle n_{i}(\pmb{R}) =j=1,jiNρtj(Rij).\displaystyle=\sum_{j=1,j\neq i}^{N}\rho_{t_{j}}(R_{ij})\,. (17)

The functions Fα(n)F_{\alpha}(n), ϕαβ(R)\phi_{\alpha\beta}(R), and ρα(R)\rho_{\alpha}(R) are constructed empirically or by fitting the potential energy calculated by DFT calculations. In actual simulations, the value of these functions is evaluated by linear interpolation. In terms of KAN, the local potential energy EiEAME_{i}^{\rm EAM} can be rewritten as

EiEAM()\displaystyle E_{i}^{\mathrm{EAM}}(\pmb{R}) =Fti(ni())+I(mi()),\displaystyle=F_{t_{i}}\!\left(n_{i}(\pmb{R})\right)+I\left(m_{i}(\pmb{R})\right)\,, (18)
I(x)\displaystyle I(x) =x,\displaystyle=x\,, (19)
mi()\displaystyle m_{i}(\pmb{R}) =j=1,jiN12ϕtitj(Rij),\displaystyle=\sum_{j=1,j\neq i}^{N}\frac{1}{2}\phi_{t_{i}t_{j}}(R_{ij})\,, (20)

where tit_{i} is the element type of the ii-th atom. The EAM potential can be represented by a type of KAN, as shown in Fig. 4.

Refer to caption
Figure 4: The schematic of EAM potential

In terms of KAN, the EAM potential can be extended by adding more hidden layers and/or units.

III.2.3 Aarificial neural network potential

There are various kinds of ANN potentials. In this paper, we focus on Behler–Parrinello type neural networks [8]. In their formulation, the potential energy is assumed to be constructed from a sum of fictitious atomic energies, similar to the LJ and EAM potentials. Furthermore, the atomic energy is considered a functional of the atomic environment around the central atom, defined inside a sphere with a cutoff radius RcR_{\mathrm{c}}. The feature values of the atomic environment, called descriptors, were introduced to ensure the translational and rotational invariance of the atomic energy. Therefore, the total and atomic energies are given as

EANN()\displaystyle E^{\mathrm{ANN}}(\pmb{R}) =i=1NEtiANN(𝕕i(;Rc)),\displaystyle=\sum_{i=1}^{N}E_{t_{i}}^{\mathrm{ANN}}\left(\pmb{d}_{i}(\pmb{R};{R_{\mathrm{c}}})\right), (21)

where EANNE^{\mathrm{ANN}}, EtiANNE_{t_{i}}^{\mathrm{ANN}}, and 𝕕i\pmb{d}_{i} are the total energy, the atomic energy for the ii-th atom, and descriptor of the atomic environment for the centered ii-th atom, respectively. There are many kinds of descriptors. For example, in the case of the Chebyshev descriptor [11], the descriptor is determined as

𝕕i(;Rc)\displaystyle\pmb{d}_{i}(\pmb{R};R_{\mathrm{c}}) =(di;0(;Rc)di;N0(;Rc))\displaystyle=\begin{pmatrix}d_{i;0}(\pmb{R};R_{\mathrm{c}})\\ \vdots\\ d_{i;N_{0}}(\pmb{R};R_{\mathrm{c}})\end{pmatrix}
=(𝕕i(R)(;Rc)𝕕i(A)(;Rc)),\displaystyle=\begin{pmatrix}\pmb{d}_{i}^{(\mathrm{R})}(\pmb{R};R_{\mathrm{c}})\\ \pmb{d}_{i}^{(\mathrm{A})}(\pmb{R};R_{\mathrm{c}})\end{pmatrix}, (22)
𝕕i(R)(;Rc)\displaystyle\pmb{d}_{i}^{(\mathrm{R})}(\pmb{R};R_{\mathrm{c}}) =(𝕕i(r)(;Rc)𝕕i(r,w)(;Rc)),\displaystyle=\begin{pmatrix}\pmb{d}_{i}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}})\\ \pmb{d}_{i}^{(\mathrm{r,w})}(\pmb{R};R_{\mathrm{c}})\end{pmatrix},
=((di;0(r)(;Rc),,di;N(r)(r)(;Rc))T(di;0(r,w)(;Rc),,di;N(r)(r,w)(;Rc))T),\displaystyle=\begin{pmatrix}(d_{i;0}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{r})}}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\\ (d_{i;0}^{(\mathrm{r,w})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{r})}}^{(\mathrm{r,w})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\end{pmatrix}, (23)
𝕕i(A)(;Rc)\displaystyle\pmb{d}_{i}^{(\mathrm{A})}(\pmb{R};R_{\mathrm{c}}) =(𝕕i(a)(;Rc)𝕕i(a,w)(;Rc))\displaystyle=\begin{pmatrix}\pmb{d}_{i}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}})\\ \pmb{d}_{i}^{(\mathrm{a,w})}(\pmb{R};R_{\mathrm{c}})\end{pmatrix}
=((di;0(a)(;Rc),,di;N(a)(a)(;Rc))T(di;0(a,w)(;Rc),,di;N(a)(a,w)(;Rc))T)\displaystyle=\begin{pmatrix}(d_{i;0}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{a})}}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\\ (d_{i;0}^{(\mathrm{a,w})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{a})}}^{(\mathrm{a,w})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\end{pmatrix} (24)
di;s(r)(;Rc)\displaystyle d_{i;s}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}}) =j=1,jiNTs(R~(ij;Rc))fc(ij;Rc),\displaystyle=\sum_{j=1,j\neq i}^{N}T_{s}(\tilde{R}(\pmb{R}_{ij};R_{\mathrm{c}}))f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}}), (25)
di;s(r,w)(;Rc)\displaystyle d_{i;s}^{(\mathrm{r},\mathrm{w})}(\pmb{R};R_{\mathrm{c}}) =j=1,jiNTs(R~(ij;Rc))fc(ij;Rc)wtj,\displaystyle=\sum_{j=1,j\neq i}^{N}T_{s}(\tilde{R}(\pmb{R}_{ij};R_{\mathrm{c}}))f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})w_{t_{j}}, (26)
di;s(a)(;Rc)\displaystyle d_{i;s}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}}) =j,k=1,jkji,kiNTs(c(ij,ik))\displaystyle=\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}T_{s}\!\left(c(\pmb{R}_{ij},\pmb{R}_{ik})\right)
×fc(ij;Rc)fc(ik;Rc),\displaystyle\qquad\qquad\quad{}\times f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})f_{\mathrm{c}}(\pmb{R}_{ik};R_{\mathrm{c}}), (27)
di;s(a,w)(;Rc)\displaystyle d_{i;s}^{(\mathrm{a},\mathrm{w})}(\pmb{R};R_{\mathrm{c}}) =j,k=1,jkji,kiNTs(c(ij,ik))\displaystyle=\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}T_{s}\!\left(c(\pmb{R}_{ij},\pmb{R}_{ik})\right)
×fc(ij;Rc)fc(ik;Rc)wtjwtk,\displaystyle\qquad\qquad{}\times f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})f_{\mathrm{c}}(\pmb{R}_{ik};R_{\mathrm{c}})w_{t_{j}}w_{t_{k}}, (28)

where

R¯(;Rc)\displaystyle\bar{R}(\pmb{R};R_{\mathrm{c}}) =2||Rc1,\displaystyle=2\frac{|\pmb{R}|}{R_{\mathrm{c}}}-1, (29)
c(,)\displaystyle c(\pmb{R},\pmb{R}^{\prime}) =||||,\displaystyle=\frac{\pmb{R}\cdot\pmb{R}^{\prime}}{|\pmb{R}||\pmb{R}^{\prime}|}, (30)
N0\displaystyle N_{0} =2N(r)+2N(a)+3,\displaystyle=2N^{(\mathrm{r})}+2N^{(\mathrm{a})}+3, (31)

Ts(x)T_{s}(x) is the ss-th Chebyshev polynomial of the first kind, and fc(;Rc)f_{\mathrm{c}}(\pmb{R};R_{\mathrm{c}}) is a cutoff function that smoothly goes to zero at RcR_{\mathrm{c}}. For example, the following function is used for many cases:

fc(;Rc)={12[cos(π||Rc)+1](0||Rc)0(Rc<||).f_{\mathrm{c}}(\pmb{R};R_{\mathrm{c}})=\begin{cases}\displaystyle\frac{1}{2}\left[\cos\!\left(\pi\frac{|\pmb{R}|}{R_{\mathrm{c}}}\right)+1\right]&\displaystyle(0\leq|\pmb{R}|\leq R_{\mathrm{c}})\\ \displaystyle 0&\displaystyle(R_{\mathrm{c}}<|\pmb{R}|)\end{cases}. (32)

We introduce the shorthand notations as

𝕕i()\displaystyle\pmb{d}^{(\cdot)}_{i} =𝕕i()(,Rc),\displaystyle=\pmb{d}^{(\cdot)}_{i}(\pmb{R},R_{\mathrm{c}})\,, (33)
di;s()\displaystyle d^{(\cdot)}_{i;s} =di;s()(,Rc),\displaystyle=d^{(\cdot)}_{i;s}(\pmb{R},R_{\mathrm{c}})\,, (34)
R¯ij\displaystyle\bar{R}_{ij} =R¯(ij;Rc),\displaystyle=\bar{R}(\pmb{R}_{ij};R_{\mathrm{c}})\,, (35)
cijk\displaystyle c_{ijk} =c(ij,ik),\displaystyle=c(\pmb{R}_{ij},\pmb{R}_{ik})\,, (36)
f¯c(Rij)\displaystyle\bar{f}_{\mathrm{c}}(R_{ij}) =fc(ij;Rc).\displaystyle=f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})\,. (37)

When one considers MLP with two hidden layers, the ii-th atomic energy is given as

EtiANN(𝕕i)=2=1N2Wti;12(3)xi;2(2)+bti(3),\displaystyle E_{t_{i}}^{\mathrm{ANN}}(\pmb{d}_{i})=\sum_{\ell_{2}=1}^{N_{2}}W^{(3)}_{t_{i};1\ell_{2}}x^{(2)}_{i;\ell_{2}}+b_{t_{i}}^{(3)}\,, (38)
xi;2(2)=σ(2)(1=1N1Wti;21(2)xi;1(1)+bti;2(2)),\displaystyle x^{(2)}_{i;\ell_{2}}=\sigma^{(2)}\!\!\left(\sum_{\ell_{1}=1}^{N_{1}}W^{(2)}_{t_{i};\ell_{2}\ell_{1}}x^{(1)}_{i;\ell_{1}}+b_{t_{i};\ell_{2}}^{(2)}\right), (39)
xi;1(1)=σ(1)(0=0N0Wti;10(1)d¯i;0(;Rc)+bti;1(1)),\displaystyle x^{(1)}_{i;\ell_{1}}=\sigma^{(1)}\!\!\left(\sum_{\ell_{0}=0}^{N_{0}}W^{(1)}_{t_{i};\ell_{1}\ell_{0}}\bar{d}_{i;\ell_{0}}(\pmb{R};{R_{\mathrm{c}}})+b_{t_{i};\ell_{1}}^{(1)}\right), (40)

where σ(l)(x)\sigma^{(l)}(x) is an activation function of ll-th layer such as tanh(x)\tanh(x), and 𝕕¯i\bar{\pmb{d}}_{i} is the normalized descriptor, of which element is defined as

d¯i;0\displaystyle\,\,\,\quad\bar{d}_{i;\ell_{0}}
=di;0mti;0σti;0\displaystyle=\frac{d_{i;\ell_{0}}-m_{t_{i};\ell_{0}}}{\sigma_{t_{i};\ell_{0}}}
={di;0(R)mti;0(R)σti;0(R)(02N(r)+1)di;02N(r)2(A)mti;02N(r)2(A)σti;02N(r)2(A)(2N(r)+1<0),\displaystyle=\begin{cases}\displaystyle\frac{d_{i;\ell_{0}}^{(\mathrm{R})}-m_{t_{i};\ell_{0}}^{(\mathrm{R})}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}&(\ell_{0}\leq 2N^{(\mathrm{r})}+1)\\ \displaystyle\frac{d_{i;\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}-m_{t_{i};\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}}{\sigma_{t_{i};\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}}&(2N^{(\mathrm{r})}+1<\ell_{0})\end{cases}, (41)

where

mti;0\displaystyle m_{t_{i};\ell_{0}} ={mti;0(R)(02N(r)+1)mti;02N(r)2(A)(2N(r)+1<0),\displaystyle=\begin{cases}\displaystyle m_{t_{i};\ell_{0}}^{(\mathrm{R})}&(\ell_{0}\leq 2N^{(\mathrm{r})}+1)\\ \displaystyle m_{t_{i};\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}&(2N^{(\mathrm{r})}+1<\ell_{0})\end{cases}, (42)
σti;0\displaystyle\sigma_{t_{i};\ell_{0}} ={σti;0(R)(02N(r)+1)σti;0N2(r)2(A)(2N(r)+1<0),\displaystyle=\begin{cases}\displaystyle\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}&(\ell_{0}\leq 2N^{(\mathrm{r})}+1)\\ \displaystyle\sigma_{t_{i};\ell_{0}-N2^{(\mathrm{r})}-2}^{(\mathrm{A})}&(2N^{(\mathrm{r})}+1<\ell_{0})\end{cases}, (43)

Here, mi0m_{i_{0}} and σi0\sigma_{i_{0}} are the mean value and standard deviation in a total data set. The ANN potential can be represented as a KAN, as shown in Fig. 5.

Refer to caption
Figure 5: The schematic of ANN potential. (a) The total energy is defined by the summation of atomic energies. (b) The atomic energy is given by the ANN, which is described by a KAN.

Let us discuss the relation between the KAN and ANN potential. We focus on the first layer shown in Eq. (40). This equation is rewritten as

xi;1(1)\displaystyle x^{(1)}_{i;\ell_{1}} =σ(1)(zi;1(r)+zi;1(a)+bti;1(1)),\displaystyle=\sigma^{(1)}\!\left(z^{\mathrm{(r)}}_{i;\ell_{1}}+z^{\mathrm{(a)}}_{i;\ell_{1}}+b_{t_{i};\ell_{1}}^{(1)}\right), (44)

where we define zi;1(r)z^{(\mathrm{r})}_{i;\ell_{1}} and zi;1(a)z^{(\mathrm{a})}_{i;\ell_{1}} as

zi;1(r)\displaystyle z^{(\mathrm{r})}_{i;\ell_{1}} =0=02N(r)+1Wti;10(1)di;0(R)mti;0(R)σti;0(R)\displaystyle=\sum_{\ell_{0}=0}^{2N^{(\mathrm{r})}+1}W^{(1)}_{t_{i};\ell_{1}\ell_{0}}\frac{d_{i;\ell_{0}}^{(\mathrm{R})}-m_{t_{i};\ell_{0}}^{(\mathrm{R})}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}
=0=02N(r)+1W¯ti;10(1,R)di;0(R)+bti;1(R),\displaystyle=\sum_{\ell_{0}=0}^{2N^{(\mathrm{r})}+1}\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}}d_{i;\ell_{0}}^{(\mathrm{R})}+b_{t_{i};\ell_{1}}^{(\mathrm{R})}\,, (45)
W¯ti;10(1,R)\displaystyle\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}} =Wti;10(1)σti;0(R),\displaystyle=\frac{W^{(1)}_{t_{i};\ell_{1}\ell_{0}}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}\,, (46)
bti;1(R)\displaystyle b_{t_{i};\ell_{1}}^{(\mathrm{R})} =0=0N(r)Wti;10(1)mti;0(R)σti;0(R),\displaystyle=-\sum_{\ell_{0}=0}^{N^{(\mathrm{r})}}W^{(1)}_{t_{i};\ell_{1}\ell_{0}}\frac{m_{t_{i};\ell_{0}}^{(\mathrm{R})}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}\,, (47)

and

zi;1(a)\displaystyle z^{(\mathrm{a})}_{i;\ell_{1}} =0=02N(a)+1Wti;10+2N(r)+2(1)di;0(A)mti;0(A)σti;0(A)\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{2N^{(\mathrm{a})}+1}W^{(1)}_{t_{i};\ell_{1}\ell_{0}^{\prime}+2N^{(\mathrm{r})}+2}\frac{d_{i;\ell_{0}^{\prime}}^{(\mathrm{A})}-m_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}{\sigma_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}
=0=02N(a)+1W¯ti;10(1,A)di;0(A)+bti;1(A),\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{2N^{(\mathrm{a})}+1}\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}}d_{i;\ell_{0}^{\prime}}^{(\mathrm{A})}+b_{t_{i};\ell_{1}}^{(\mathrm{A})}\,, (48)
W¯ti;10(1,A)\displaystyle\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}} =Wti;10+2N(r)+2(1)σti;0(A),\displaystyle=\frac{W^{(1)}_{t_{i};\ell_{1}\ell_{0}^{\prime}+2N^{(\mathrm{r})}+2}}{\sigma_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}\,, (49)
bti;1(A)\displaystyle b_{t_{i};\ell_{1}}^{(\mathrm{A})} =0=02N(a)+1Wti;10+2N(r)+2(1)mti;0(A)σti;0(A),\displaystyle=-\sum_{\ell_{0}^{\prime}=0}^{2N^{(\mathrm{a})}+1}W^{(1)}_{t_{i};\ell_{1}\ell_{0}^{\prime}+2N^{(\mathrm{r})}+2}\frac{m_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}{\sigma_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}\,, (50)

respectively. By substituting Eqs. (26) and (28) into Eq. (45), zi1(r)z^{(\mathrm{r})}_{i_{1}} and zi1(a)z^{(\mathrm{a})}_{i_{1}} are given as

zi;1(r)\displaystyle z^{\mathrm{(r)}}_{i;\ell_{1}} =j=1,jiNΦti,tj;1(r)(R¯ij)f¯c(Rij)+bti;1(R),\displaystyle=\sum_{j=1,j\neq i}^{N}\Phi_{t_{i},t_{j};\ell_{1}}^{\mathrm{(r)}}(\bar{R}_{ij})\,\bar{f}_{\mathrm{c}}(R_{ij})+b_{t_{i};\ell_{1}}^{\mathrm{(R)}}, (51)
z1(a)\displaystyle z^{\mathrm{(a)}}_{\ell_{1}} =j,k=1,jkji,kiNΦti,tj,tk;1(a)(cijk)f¯c(Rij)f¯c(Rik)+bti;1(A),\displaystyle=\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}\Phi_{t_{i},t_{j},t_{k};\ell_{1}}^{\mathrm{(a)}}(c_{ijk})\bar{f}_{\mathrm{c}}(R_{ij})\bar{f}_{\mathrm{c}}(R_{ik})+b_{t_{i};\ell_{1}}^{\mathrm{(A)}}, (52)

where

Φti,tj;1(r)(R)\displaystyle\Phi_{t_{i},t_{j};\ell_{1}}^{\mathrm{(r)}}(R) =ϕti;1(r)(Rij)+ϕti;1(r,w)(Rij)wtj,\displaystyle=\phi_{t_{i};\ell_{1}}^{\mathrm{(r)}}(R_{ij})+\phi_{t_{i};\ell_{1}}^{\mathrm{(r,w)}}(R_{ij})w_{t_{j}}, (53)
Φti,tj,tk;1(a)(cosθ)\displaystyle\Phi_{t_{i},t_{j},t_{k};\ell_{1}}^{\mathrm{(a)}}(\cos\theta) =ϕti;1(a)(cosθ)+ϕti;1(a,w)(cosθ)wtjwtk,\displaystyle=\phi_{t_{i};\ell_{1}}^{\mathrm{(a)}}(\cos\theta)+\phi_{t_{i};\ell_{1}}^{\mathrm{(a,w)}}(\cos\theta)w_{t_{j}}w_{t_{k}}, (54)
ϕti;1(r)(R)\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(r)}}(R) =0=0N(r)W¯ti;10(1,R)T0(R),\displaystyle=\sum_{\ell_{0}=0}^{N^{\mathrm{(r)}}}\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}}T_{\ell_{0}}(R), (55)
ϕti;1(r,w)(R)\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(r,w)}}(R) =0=0N(r)W¯ti;10+N(r)+1(1,R)T0(R),\displaystyle=\sum_{\ell_{0}=0}^{N^{\mathrm{(r)}}}\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}+N^{\mathrm{(r)}}+1}T_{\ell_{0}}(R), (56)
ϕti;1(a)(cosθ)\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(a)}}(\cos\theta) =0=0N(a)W¯ti;10(1,A)T0(cosθ),\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{N^{\mathrm{(a)}}}\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}}T_{\ell_{0}^{\prime}}(\cos\theta), (57)
ϕti;1(a,w)(cosθ)\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(a,w)}}(\cos\theta) =0=0N(a)W¯ti;10+N(a)+1(1,A)T0(cosθ).\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{N^{\mathrm{(a)}}}\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}+N^{\mathrm{(a)}}+1}T_{\ell_{0}^{\prime}}(\cos\theta). (58)

In terms of KAN, the one-dimensional non-linear functions ϕti;1(r)(R)\phi_{t_{i};\ell_{1}}^{\mathrm{(r)}}(R), ϕti;1(r,w)(Rij)\phi_{t_{i};\ell_{1}}^{\mathrm{(r,w)}}(R_{ij}), ϕti;1(a)(cosθ)\phi_{t_{i};\ell_{1}}^{\mathrm{(a)}}(\cos\theta), and ϕti;1(a,w)(cosθ)\phi_{t_{i};\ell_{1}}^{\mathrm{(a,w)}}(\cos\theta) are regarded as activation functions in the KAN structure. The above equations show that these non-linear functions are expanded by the Chebyshev polynomials.

IV Kolmogorov–Arnold Network descriptor

IV.1 Definition

We propose a trainable descriptor called “KAN descriptor” for interatomic potentials, defined as

𝕫~i(;Rc)=(z~i;1(;Rc)z~i;2(;Rc)z~i:N~0(;Rc)),\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})=\begin{pmatrix}\tilde{z}_{i;1}(\pmb{R};R_{\mathrm{c}})\\ \tilde{z}_{i;2}(\pmb{R};R_{\mathrm{c}})\\ \vdots\\ \tilde{z}_{i:\tilde{N}_{0}}(\pmb{R};R_{\mathrm{c}})\end{pmatrix}, (59)

where

z~i;0(;Rc)\displaystyle\quad\,\,\tilde{z}_{i;\ell_{0}}(\pmb{R};R_{\mathrm{c}})
=j=1,jiNΨ~titj;0(r)(ij;Rc)\displaystyle=\sum_{j=1,j\neq i}^{N}\tilde{\Psi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij};R_{\rm c})
+j,k=1,jkji,kiNΨ~titjtk;0(a)(ij,ik;Rc)+b~ti;0,\displaystyle\quad\,{}+\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}\tilde{\Psi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik};R_{\rm c})+\tilde{b}_{t_{i};\ell_{0}}\,, (60)
=j=1,jiNΦ~titj;0(r)(ij)fc(ij;Rc)\displaystyle=\sum_{j=1,j\neq i}^{N}\tilde{\Phi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij})f_{\rm c}(\pmb{R}_{ij};R_{\rm c})
+j,k=1,jkji,kiNΦ~titjtk;0(a)(ij,ik)fc(ij;Rc)fc(ik;Rc)\displaystyle\quad\,{}+\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik})f_{\rm c}(\pmb{R}_{ij};R_{\rm c})f_{\rm c}(\pmb{R}_{ik};R_{\rm c})
+b~ti;0.\displaystyle\quad\,{}+\tilde{b}_{t_{i};\ell_{0}}. (61)

Here, Φ~titj;0(r)(ij)\tilde{\Phi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij}) and Φ~titjtk;0(a)(ij,ik)\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik}) are radial (two-body) and angular (three-body) non-linear functions, which should be optimized.

The total energy is expressed as

EANN()\displaystyle E^{\mathrm{ANN}}(\pmb{R}) =i=1NE~tiANN(𝕫~i(;Rc)),\displaystyle=\sum_{i=1}^{N}\tilde{E}^{\mathrm{ANN}}_{t_{i}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})), (62)

where E~tiANN(𝕫~i(;Rc))\tilde{E}^{\mathrm{ANN}}_{t_{i}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})) is the ii-th atomic potential constructed by neural networks with the KAN descriptor 𝕫~i(;Rc)\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}}) as shown in Fig. 6. It should be noted that both the function E~tiANN\tilde{E}^{\mathrm{ANN}}_{t_{i}} and the KAN descriptor 𝕫~i(;Rc)\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}}) are trainable. When one considers MLP with one hidden layer, the ii-th atomic energy is expressed as

E~tiANN(𝕫~i(;Rc))\displaystyle\tilde{E}^{\mathrm{ANN}}_{t_{i}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})) =1=1N~1W~ti;1(2)x~i;1(1)+b~ti(2),\displaystyle=\sum_{\ell_{1}=1}^{\tilde{N}_{1}}\tilde{W}^{(2)}_{t_{i};\ell_{1}}\tilde{x}^{(1)}_{i;\ell_{1}}+\tilde{b}^{(2)}_{t_{i}}\,, (63)
x~i;1(1)\displaystyle\tilde{x}^{(1)}_{i;\ell_{1}} =σ~(1)(0=1N~0W~ti;10(1)x~i;0(0)+b~ti;1(1)),\displaystyle=\tilde{\sigma}^{(1)}\!\left(\sum_{\ell_{0}=1}^{\tilde{N}_{0}}\tilde{W}^{(1)}_{t_{i};\ell_{1}\ell_{0}}\tilde{x}^{(0)}_{i;\ell_{0}}+\tilde{b}^{(1)}_{t_{i};\ell_{1}}\right), (64)
x~i;0(0)\displaystyle\tilde{x}^{(0)}_{i;\ell_{0}} =σ~(0)(z~i;0(;Rc)).\displaystyle=\tilde{\sigma}^{(0)}\!\left(\tilde{z}_{i;\ell_{0}}(\pmb{R};R_{\mathrm{c}})\right). (65)

The network architecture with KAN descriptors is shown in Fig. 7.

Refer to caption
Figure 6: The schematic of i0i_{0}-th element of the Kolmogorov–Arnold network descriptor.
Refer to caption
Figure 7: The schematic of artificial neural networks potential with the Kolmogorov–Arnold network descriptors.

In the original KAN layer shown in Eq. (8), non-linear functions ϕl,il,il1(x)\phi_{l,i_{l},i_{l-1}}(x) depend on indices of input and output neurons. However, in the KAN descriptor, the non-linear functions Φ~titj;0(r)(ij)\tilde{\Phi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij}) and Φ~titjtk;0(a)(ij,ik)\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik}) do not depend on an index of input neurons and only on the kinds of atoms and an index of output neurons since atoms with the same types are indistinguishable. In Fig. 6, it looks there are NiN_{i} distinct functions Φ~titj;0(r)\tilde{\Phi}^{\mathrm{(r)}}_{t_{i}t_{j};\ell_{0}} and Ni2N_{i}^{2} distinct functions Φ~titjtk;0(a)\tilde{\Phi}^{\rm(a)}_{t_{i}t_{j}t_{k};\ell_{0}} for 0\ell_{0}. However, the number of non-linear functions of the KAN descriptor is usually much smaller than NiN_{i} or Ni2N_{i}^{2} and depends on the number of the kind of atoms. For example, consider 10 water molecules surrounding the ii-th hydrogen atom in an H2O system. In this case, only two Φ~(r)\tilde{\Phi}^{\mathrm{(r)}} functions (Φ~HH;0(r)\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HH};\ell_{0}} and Φ~HO;0(r)\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HO};\ell_{0}}) and three Φ~(a)\tilde{\Phi}^{\mathrm{(a)}} functions (Φ~HHH;0(r)\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HHH};\ell_{0}}, Φ~HHO;0(r)\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HHO};\ell_{0}}, and Φ~HOO;0(r)\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HOO};\ell_{0}}) are required although Ni=30N_{i}=30.

These non-linear functions should be optimized to approximate DFT energy. The non-linear functions can be expanded using B-spline functions, Gaussian radial distribution functions, or Chebyshev polynomials. During the training procedure, the coefficients of the non-linear functions are adjusted. Once accurate non-linear functions with appropriate coefficients are obtained, values can be evaluated using linear or spline interpolation. In this study, we employ the Chebyshev polynomials as

Φ~titj;0(r)(ij)\displaystyle\tilde{\Phi}^{\mathrm{(r)}}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij}) =φ~ti;0(r,0)(R¯ij)+φ~ti;0(r,1)(R¯ij)wtj,\displaystyle=\tilde{\varphi}^{\mathrm{(r,0)}}_{t_{i};\ell_{0}}(\bar{R}_{ij})+\tilde{\varphi}^{\mathrm{(r,1)}}_{t_{i};\ell_{0}}(\bar{R}_{ij})w_{t_{j}}, (66)
φ~ti;0(r,n)(R)\displaystyle\tilde{\varphi}^{(\mathrm{r},n)}_{t_{i};\ell_{0}}(R) =s=0N~(r)W~ti;0s(r,n)Ts(R),\displaystyle=\sum_{s=0}^{\tilde{N}^{\mathrm{(r)}}}\tilde{W}^{(\mathrm{r},n)}_{t_{i};\ell_{0}s}\,T_{s}(R), (67)
Φ~titjtk;0(a)(ij,ik)\displaystyle\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik}) =φ~ti;0(a,0)(cijk)+φ~ti;0(a,0)(cijk)wtjwtk,\displaystyle=\tilde{\varphi}^{\mathrm{(a,0)}}_{t_{i};\ell_{0}}(c_{ijk})+\tilde{\varphi}^{\mathrm{(a,0)}}_{t_{i};\ell_{0}}(c_{ijk})w_{t_{j}}w_{t_{k}}, (68)
φ~ti;0(a,n)(cosθ)\displaystyle\tilde{\varphi}^{(\mathrm{a},n)}_{t_{i};\ell_{0}}(\cos\theta) =s=0N~(a)W~ti;0s(a,n)Ts(cosθ).\displaystyle=\sum_{s=0}^{\tilde{N}^{\mathrm{(a)}}}\tilde{W}^{(\mathrm{a},n)}_{t_{i};\ell_{0}s}\,T_{s}(\cos\theta). (69)

We mention the representation ability of the non-linear functions. The number of the trainable parameters is 2N~0(N~(r)+1)+2N~0(N~(a)+1)2\tilde{N}_{0}(\tilde{N}^{\mathrm{(r)}}+1)+2\tilde{N}_{0}(\tilde{N}^{\mathrm{(a)}}+1) for the KAN descriptor 𝕫~i(;Rc)\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}}). In addition, these non-linear functions depend on the kinds of atoms as same as the original Chebyshev descriptor [11]. Therefore, the KAN descriptor can avoid dimensional explosion with respect to the number of atom kinds like the symmetry functions [8].

IV.2 Relation between the Chebyshev and the Kolmogorov–Arnold network descriptors

The relations between the coefficients of the non-linear functions of the KAN descriptor and the coefficients defined in Eqs. (44)–(45) are expressed as

b~ti;0\displaystyle\tilde{b}_{t_{i};\ell_{0}} =bti;0(1)+bti;0(R)+bti;0(A),\displaystyle=b_{t_{i};\ell_{0}}^{(1)}+b_{t_{i};\ell_{0}}^{(\mathrm{R})}+b_{t_{i};\ell_{0}}^{(\mathrm{A})}, (70)
W~ti;0n(r,0)\displaystyle\tilde{W}^{(\mathrm{r},0)}_{t_{i};\ell_{0}n} =W¯ti;0n(1,R),\displaystyle=\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{0}n}, (71)
W~ti;0n(r,1)\displaystyle\tilde{W}^{(\mathrm{r},1)}_{t_{i};\ell_{0}n} =W¯ti;0n+N~(r)+1(1,R),\displaystyle=\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{0}n+\tilde{N}^{(r)}+1}, (72)
W~ti;0n(a,0)\displaystyle\tilde{W}^{(\mathrm{a},0)}_{t_{i};\ell_{0}n} =W¯ti;0n(1,A),\displaystyle=\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{0}n}, (73)
W~ti;0n(a,1)\displaystyle\tilde{W}^{(\mathrm{a},1)}_{t_{i};\ell_{0}n} =W¯ti;0n+N~(a)+1(1,A).\displaystyle=\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{0}n+\tilde{N}^{(a)}+1}. (74)

In the previous Chebyshev descriptor, σti;0(R)\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}, σti;0(A)\sigma_{t_{i};\ell_{0}}^{(\mathrm{A})}, mti;0(R)m_{t_{i};\ell_{0}}^{(\mathrm{R})} and mti;0(A)m_{t_{i};\ell_{0}}^{(\mathrm{A})} are calculated by a data set. In the KAN descriptor, these variables can be trainable. We note that the MLP with two-hidden layers with the Chebyshev descriptor can be regarded as the MLP with one-hidden layer with the KAN descriptor, since the KAN descriptor is trainable.

V Results

V.1 Target system and method

We consider iron with vacancies and dislocations. We use the training dataset available on GitHub as described in Ref. [29]. The KAN descriptor and MLP are implemented using the machine-learning package Flux.jl version 0.14.11, written in the Julia language[30, 31]. The sizes of the total dataset, training dataset, and test dataset are 6348, 5713, and 635, respectively. The energy of each atomic structure is calculated using Quantum Espresso, a first-principles calculation tool based on density functional theory [32]. The details of the calculation are provided in Ref. [29]. We adopt an MLP with one hidden layer for E~iANN(𝕫~i(;Rc))\tilde{E}_{i}^{\mathrm{ANN}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})). The dimension of the input vector 𝕫~\tilde{\pmb{z}} and the number of units in the hidden layer were set as N~0=10\tilde{N}_{0}=10 and N~1=10\tilde{N}_{1}=10, respectively. To train the ANN potential, we use the limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm implemented in the Optim.jl package[33]. For each optimization with the limited-memory BFGS, we perform 10 iterations to reduce the loss function defined as

MSE\displaystyle{\rm MSE} =1Nbatchk=1Nbatch(EFP(k)EANN(k))2.\displaystyle=\frac{1}{N_{\rm batch}}\sum_{k=1}^{N_{\rm batch}}\left(E^{\mathrm{FP}(k)}-E^{{\rm ANN}(k)}\right)^{2}. (75)

Here, the batch size NbatchN_{\rm batch} in the Optim.jl package is 100. EFP(k)E^{\mathrm{FP}(k)} and EANN(k)E^{\rm ANN}(k) are the potential energies calculated by first-principles calculation and ANN, respectively.

V.2 Shape of the non-linear functions in KAN descriptors

Let us discuss the shape of the non-linear functions in KAN descriptors. We consider two parameter sets (N~(r),N~(a))=(20,20)\tilde{N}^{\mathrm{(r)}},\tilde{N}^{\mathrm{(a)}})=(20,20) and (50,20)(50,20). As shown in Fig. 8, the shape of the non-linear functions in KAN descriptors Φ~FeFe;0(r)(ij)fc(ij;Rc)\tilde{\Phi}^{(\mathrm{r})}_{\mathrm{FeFe};\ell_{0}}(\pmb{R}_{ij})f_{\rm c}(\pmb{R}_{ij};R_{\rm c}) depends on N~(r)\tilde{N}^{\mathrm{(r)}}.

Refer to caption
Refer to caption
Figure 8: Two-body non-linear functions Φ~FeFe;0(r)(ij)fc(ij;Rc)\tilde{\Phi}^{(\mathrm{r})}_{\mathrm{FeFe};\ell_{0}}(\pmb{R}_{ij})f_{\rm c}(\pmb{R}_{ij};R_{\rm c}) in KAN descriptor. (Top panel)(N~(r),N~(a))=(20,20)\tilde{N}^{\mathrm{(r)}},\tilde{N}^{\mathrm{(a)}})=(20,20) and (Bottom panel) (N~(r),N~(a))=(50,20)\tilde{N}^{\mathrm{(r)}},\tilde{N}^{\mathrm{(a)}})=(50,20).

In the region Rij<2R_{ij}<2 [Å], the curves strongly depend on N~(r)\tilde{N}^{\mathrm{(r)}}. It should be noted that there is no data with Rij<2R_{ij}<2 [Å] in the training and test data set. As shown in Fig. 9, the distribution of inter-atomic distance RijR_{ij} is not uniform. In the region where no data is found in the training data set, the shape of the non-linear function in the KAN descriptor can not affect the quality of the ANN potential.

Refer to caption
Figure 9: Radial distribution function in total data set

In terms of KANs, if the non-linear functions are well optimized with a fixed number of functions, the loss converges with increasing the number of the Chebyshev polynomials. As shown in Fig. 10, the limits of the expressive power of the KAN descriptors are reached with increasing the number of the Chebyshev polynomials for radial non-linear functions.

Refer to caption
Figure 10: N~(r)\tilde{N}^{\mathrm{(r)}} dependence of the Mean absolute error. We fix the number of the angular Chebyshev basis (N~(a)=20\tilde{N}^{\mathrm{(a)}}=20), the dimension of the input vector constructed by the KAN descriptors and the number of the units of the hidden layer in the neural networks.

V.3 Fast evaluation of the Kolmogorov–Arnold network descriptors

We propose a method of a fast evaluation of the ANN potential with KAN descriptors. The conventional ANN potential constructed by the Chebyshev polynomial descriptor can be regarded as the ANN potential with KAN descriptors. The computational cost to evaluate the KAN descriptor is 𝒪(N~(r))+𝒪(N~(a)){\cal O}(\tilde{N}^{\mathrm{(r)}})+{\cal O}(\tilde{N}^{\mathrm{(a)}}). With the use of linear or spline interpolation with NN points, the computational cost becomes 𝒪(logN){\cal O}(\log N) which does not depend on the order of Chebyshev polynomials. We note that the computational cost of the forces can also be reduced with the use of interpolations.

We discuss the interpolation point dependence of the computational time and accuracy. In the top panel in Fig. 11, we show the residual norm of the KAN descriptors defined as

𝕫~𝕫~interpolation𝕫~,\displaystyle\frac{\left\|\tilde{\pmb{z}}-\tilde{\pmb{z}}^{\mathrm{interpolation}}\right\|}{\|\tilde{\pmb{z}}\|}, (76)

where 𝕫~interpolation\tilde{\pmb{z}}^{\rm interpolation} is the input vector constructed by interpolations, and 𝔸=𝔸2\|\pmb{A}\|=\pmb{A}^{2}. We adopt a linear interpolation and natural cubic spline interpolation implemented in Interpolation.jl. The elapsed time is measured in MacBook Pro (14 inch, 2023) with Apple M2 Max processor. The parameters of the Chebyshev polynomial is (N~(r),N~(a))=(50,20)(\tilde{N}^{\mathrm{(r)}},\tilde{N}^{\mathrm{(a)}})=(50,20). We should note that the code that we made written in Julia language might not be fully optimized. Therefore, we can discuss the elapsed time only qualitatively. As shown in Fig. 11, the elapsed time of each interpolation weakly depends on logN\log N, which is consistent with a theoretical value 𝒪(logN){\cal O}(\log N). We also discuss the mean absolute loss determined as

MAE\displaystyle{\rm MAE} =1Mk=1M|E(k)EANN(k)|,\displaystyle=\frac{1}{M}\sum_{k=1}^{M}\left|E^{(k)}-E^{{\rm ANN}(k)}\right|\!, (77)

where MM is the number of the test data set. To discuss the accuracy of the interpolated KAN descriptors, we calculate the difference between the MAE with the ANN potential with Chebyshev polynomials and that with the interpolated KAN descriptors. The MAE of the Chebyshev descriptors is 2.057962.05796 [meV/atom]. As shown in the bottom panel in Fig. 11, the difference reduces with increasing interpolation points.

Refer to caption
Refer to caption
Refer to caption
Figure 11: (Top panel) Elapsed time to calculate the mean absolute error of the test data. (Middle panel) The residual norm of the KAN descriptors. (Bottom panel) The difference between the MAE of Chebyshev descriptors and that of KAN descriptors. The parameters are (N~(r),N~(a))=(50,20)\tilde{N}^{\mathrm{(r)}},\tilde{N}^{\mathrm{(a)}})=(50,20).

We should note that, for actual machine-learning MD simulations, it is more better to implement the KAN descriptor to an open-source ANN potential package such as ænet [11]. Since the implementation might not be complicated, we will propose the software in the future.

VI Conclusion

In this study, we have integrated KAN into MD simulations to enhance the efficiency of potential energy models. Our investigation reveals that the KAN framework can represent commonly used potentials, such as the LJ, EAM, and ANN potentials. This reinterpretation allows for the application of KAN’s non-linear functions to introduce the KAN descriptor for ANN potentials. Our results demonstrate that by employing linear or cubic spline interpolations for these KAN functions, computational savings can be achieved without reducing accuracy. Future work will focus on further refining the KAN descriptors and integrating them into open-source ANN potential packages such as ænet.

Acknowledgements.
This work was partially supported by JSPS KAKENHI Grants Nos. 22H05111, 22H05114, 22K03539, and 23H03996. This work was also partially supported by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN)” and “High Performance Computing Infrastructure (HPCI)” in Japan (Project ID: jh240028 and jh240059). We would like to thank Dr. Mitsuhiro Itakura for the variable discussions. Additionally, we are grateful to Dr. Hideki Mori for providing information about his training dataset for iron with vacancies and dislocations.

References

  • Rahman and Stillinger [1971] A. Rahman and F. H. Stillinger, Molecular dynamics study of liquid water, J. Chem. Phys. 55, 3336 (1971).
  • Wang et al. [2010] S. Wang, S. Li, Z. Cao, and T. Yan, Molecular dynamic simulations of ionic liquids at graphite surface, J. Phys. Chem. C Nanomater. Interfaces 114, 990 (2010).
  • Kubicki and Lasaga [1988] J. Kubicki and A. Lasaga, Molecular dynamics simulations of SiO 2 melt and glass; ionic and covalent models, American Mineralogist 73, 941 (1988).
  • Engel et al. [2015] M. Engel, P. F. Damasceno, C. L. Phillips, and S. C. Glotzer, Computational self-assembly of a one-component icosahedral quasicrystal, Nat. Mater. 14, 109 (2015).
  • Kobayashi et al. [2023] K. Kobayashi, M. Okumura, H. Nakamura, M. Itakura, M. Machida, S. Urata, and K. Suzuya, Machine learning molecular dynamics reveals the structural origin of the first sharp diffraction peak in high-density silica glasses, Sci. Rep. 13, 18721 (2023).
  • Iftimie et al. [2005] R. Iftimie, P. Minary, and M. E. Tuckerman, Ab initio molecular dynamics: concepts, recent developments, and future trends, Proc. Natl. Acad. Sci. U. S. A. 102, 6654 (2005).
  • Mouvet et al. [2022] F. Mouvet, J. Villard, V. Bolnykh, and U. Rothlisberger, Recent advances in first-principles based molecular dynamics, Acc. Chem. Res. 55, 221 (2022).
  • Behler and Parrinello [2007] J. Behler and M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy surfaces, Phys. Rev. Lett. 98, 146401 (2007).
  • Nagai et al. [2020] Y. Nagai, M. Okumura, K. Kobayashi, and M. Shiga, Self-learning hybrid monte carlo: A first-principles approach, Phys. Rev. B Condens. Matter 102, 041124 (2020).
  • Nagai et al. [2024] Y. Nagai, Y. Iwasaki, K. Kitahara, Y. Takagiwa, K. Kimura, and M. Shiga, High-Temperature atomic diffusion and specific heat in quasicrystals, Phys. Rev. Lett. 132, 196301 (2024).
  • Artrith et al. [2017] N. Artrith, A. Urban, and G. Ceder, Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species, Phys. Rev. B 96, 014112 (2017).
  • Batzner et al. [2022] S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials, Nat. Commun. 13, 2453 (2022).
  • Musaelian et al. [2023] A. Musaelian, S. Batzner, A. Johansson, L. Sun, C. J. Owen, M. Kornbluth, and B. Kozinsky, Learning local equivariant representations for large-scale atomistic dynamics, Nat. Commun. 14, 579 (2023).
  • Bartók et al. [2010] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons, Phys. Rev. Lett. 104, 136403 (2010).
  • Thölke and De Fabritiis [2022] P. Thölke and G. De Fabritiis, TorchMD-NET: Equivariant transformers for neural network based molecular potentials, arXiv [cs.LG]  (2022).
  • Sivaraman et al. [2020] G. Sivaraman, A. N. Krishnamoorthy, M. Baur, C. Holm, M. Stan, G. Csányi, C. Benmore, and Á. Vázquez-Mayagoitia, Machine-learned interatomic potentials by active learning: amorphous and liquid hafnium dioxide, Npj Comput. Mater. 6, 1 (2020).
  • Wang et al. [2024] Y. Wang, T. Wang, S. Li, X. He, M. Li, Z. Wang, N. Zheng, B. Shao, and T.-Y. Liu, Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing, Nat. Commun. 15, 313 (2024).
  • Liu et al. [2024] Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljačić, T. Y. Hou, and M. Tegmark, KAN: Kolmogorov-arnold networks, arXiv [cs.LG]  (2024).
  • Kolmogorov [1956] A. Kolmogorov, On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables, Proceedings of the USSR Academy of Sciences 108, 179 (1956).
  • Braun and Griebel [2009] J. Braun and M. Griebel, On a constructive proof of kolmogorov’s superposition theorem, Constr. Approx. 30, 653 (2009).
  • Bhattacharjee [2024] S. S. Bhattacharjee, Torchkan: Simplified kan model with variations, https://github.com/1ssb/torchkan/ (2024).
  • Nagai [2024] Y. Nagai, Fluxkan: Julia version of the torchkan, https://github.com/cometscome/FluxKAN.jl (2024).
  • Ss et al. [2024] S. Ss, K. Ar, G. R, and A. Kp, Chebyshev polynomial-based kolmogorov-arnold networks: An efficient architecture for nonlinear function approximation, arXiv [cs.LG]  (2024).
  • Li [2024] Z. Li, Kolmogorov-arnold networks are radial basis function networks, arXiv [cs.LG]  (2024).
  • Schwerdtfeger and Wales [2024] P. Schwerdtfeger and D. J. Wales, 100 years of the lennard-jones potential, J. Chem. Theory Comput. 20, 3379 (2024).
  • Daw and Baskes [1984] M. S. Daw and M. I. Baskes, Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals, Phys. Rev. B 29, 6443 (1984).
  • Takamoto et al. [2022] S. Takamoto, C. Shinagawa, D. Motoki, K. Nakago, W. Li, I. Kurata, T. Watanabe, Y. Yayama, H. Iriguchi, Y. Asano, T. Onodera, T. Ishii, T. Kudo, H. Ono, R. Sawada, R. Ishitani, M. Ong, T. Yamaguchi, T. Kataoka, A. Hayashi, N. Charoenphakdee, and T. Ibuka, Towards universal neural network potential for material discovery applicable to arbitrary combination of 45 elements, Nat. Commun. 13, 2991 (2022).
  • Batatia et al. [2022] I. Batatia, D. P. Kovács, G. N. C. Simm, C. Ortner, and G. Csányi, MACE: Higher order equivariant message passing neural networks for fast and accurate force fields, arXiv [stat.ML]  (2022).
  • Mori et al. [2023] H. Mori, T. Tsuru, M. Okumura, D. Matsunaka, Y. Shiihara, and M. Itakura, Dynamic interaction between dislocations and obstacles in bcc iron based on atomic potentials derived using neural networks, Phys. Rev. Mater. 7, 063605 (2023).
  • Innes et al. [2018] M. Innes, E. Saba, K. Fischer, D. Gandhi, M. C. Rudilosso, N. M. Joy, T. Karmali, A. Pal, and V. Shah, Fashionable modelling with flux, CoRR abs/1811.01457 (2018)arXiv:1811.01457 .
  • Innes [2018] M. Innes, Flux: Elegant machine learning with julia, Journal of Open Source Software 10.21105/joss.00602 (2018).
  • Giannozzi et al. [2009] P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo, A. Dal Corso, S. de Gironcoli, S. Fabris, G. Fratesi, R. Gebauer, U. Gerstmann, C. Gougoussis, A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari, F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smogunov, P. Umari, and R. M. Wentzcovitch, QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials, J. Phys. Condens. Matter 21, 395502 (2009).
  • Mogensen and Riseth [2018] P. K. Mogensen and A. N. Riseth, Optim: A mathematical optimization package for Julia, Journal of Open Source Software 3, 615 (2018).