Kolmogorov–Arnold networks in molecular dynamics

Yuki Nagai Information Technology Center, The University of Tokyo, 6–2–3 Kashiwanoha, Kashiwa, Chiba 277–0882, Japan [email protected] Masahiko Okumura Center for Computational Science and e-Systems, Japan Atomic Energy Agency, Kashiwa, Chiba 277–0871, Japan

Abstract

We explore the integration of Kolmogorov–Arnold Networks (KANs) into molecular dynamics (MD) simulations to improve interatomic potentials. We propose that widely used potentials, such as the Lennard–Jones (LJ) potential, the embedded atom model (EAM), and artificial neural network (ANN) potentials, can be interpreted within the KAN framework. Specifically, we demonstrate that the descriptors for ANN potentials, typically constructed using polynomials, can be redefined using KAN’s non-linear functions. By employing linear or cubic spline interpolations for these KAN functions, we show that the computational cost of evaluating ANN potentials and their derivatives is reduced.

^†^†preprint: APS/123-QED

I Introduction

Molecular dynamics (MD) simulations have revolutionized the field of materials science, providing detailed atomistic insights into the structure, properties, and behavior of materials under various conditions [1, 2, 3, 4, 5]. By solving Newton’s equations of motion for a system of interacting atoms, MD simulations can model processes such as diffusion, phase transitions, mechanical deformation, and chemical reactions. Despite their widespread utility, traditional MD simulations are often limited by the accuracy and efficiency of the force fields used to approximate interatomic forces.

Ab initio molecular dynamics (AIMD), particularly those based on density functional theory (DFT), offer enhanced accuracy by explicitly accounting for the electronic structure of materials [6, 7]. However, the computational cost of AIMD is prohibitive for simulating large systems or long-time scales, restricting its application to relatively small and short simulations. This limitation underscores the need for more efficient computational methods that do not compromise on accuracy.

Machine learning (ML) has emerged as a transformative approach to address these challenges in MD simulations. By training on energies obtained from DFT calculations, ML models, particularly artificial neural networks (ANNs), can accurately predict potential energy surfaces and interatomic forces [8]. Integrating ML with MD, known as machine learning molecular dynamics (MLMD), offers a promising pathway to enhance both the accuracy and efficiency of simulations [9, 10, 11, 12, 13, 14].

Developing novel and effective ML methods tailored specifically for MD simulations is crucial to further unlocking the potential of MLMD [14, 9, 12, 13, 15, 16, 17]. For MLMD, multi-layer perceptrons (MLPs) are mainly used for ANN. Recently, Kolmogorov-Arnold networks (KANs), an alternative to MLPs, have been proposed [18]. They are extensions of Kolmogorov–Arnold representation theorem [19, 20], which states that any continuous function can be represented as a superposition of continuous functions of a single variable. This theoretical framework provides a powerful tool for decomposing complex functions into simpler, more manageable component functions.

In this paper, we explore the relations between KANs and potential energy in molecular dynamics. From the KAN perspective, the Lennard–Jones (LJ) potential, embedded atom mode (EAM), and ANN potential can all be viewed as forms of KANs. Utilizing linear or cubic spline interpolations, we can reduce the computational cost of the ANN potential.

This paper is organized as follows. In Sec. II, we introduce KANs. In Sec. III, we discuss the relation between the KANs and potentials used in molecular dynamics. The KAN descriptor is introduced in Sec. IV. The numerical results are shown in Sec. V. We introduce the linear and cubic spline interpolations to approximate the KAN descriptor for the speedup of simulations. In Sec. VI, the conclusion is given.

II Kolmogorov–Arnold Networks

II.1 Kolmogorov–Arnold representation theorem

Recently, KANs have been proposed inspired by the Kolmogorov–Arnold representation theorem [18]. The Kolmogorov–Arnold representation theorem shows that a multivariate function $f(\pmb{x})=f(x_{1},x_{2},\ldots,x_{n})$ can be represented by a superposition of univariate functions [19, 20]. For a smooth $f:[0,1]^{n}\rightarrow\mathbb{R}$ , a function $f(\pmb{x})$ is represented as

\displaystyle f(\pmb{x})=\sum_{q=1}^{2n+1}\Phi_{q}\!\left(\sum_{p=1}^{n}\phi_{q,p}(x_{p})\right),

(1)

where $\phi_{q,p}:[0,1]\rightarrow\mathbb{R}$ , and $\Phi_{q}:\mathbb{R}\rightarrow\mathbb{R}$ (See, Fig. 1).

Refer to caption — Figure 1: The schematic of Kolmogorov–Arnoold representation theorem.

For example, one can easily confirm that the multiplication $x_{1}x_{2}$ is represented by a superposition of univariate functions like Eq. (1),

\displaystyle x_{1}x_{2}

\displaystyle=\exp\!\left(\sum_{i=1}^{2}\log x_{i}\right).

(2)

II.2 Kolmogorov–Arnold Networks

Liu et al. have extended the idea of the Kolmogorov–Arnold representation theorem [18]. By regarding Eq. (1) as a two-layer network with one hidden layer with $2n+1$ neurons, the more general form of the networks can be given by

$\displaystyle f({\bm{x}})$	$\displaystyle=\sum_{i_{L}=1}^{n_{L}}\Phi_{i_{L}}\!\left(x^{(L)}_{i_{L}}\right),$	(3)
$\displaystyle x^{(L)}_{i_{L}}$	$\displaystyle=\sum_{i_{L-1}=1}^{n_{L-1}}\phi_{L-1,i_{L},i_{L-1}}\!\left(x_{i_{L-1}}^{(L-1)}\right),$	(4)
$\displaystyle x^{(L-1)}_{i_{L-1}}$	$\displaystyle=\sum_{i_{L-2}=1}^{n_{L-2}}\phi_{L-2,i_{L-1},i_{L-2}}\!\left(x_{i_{L-2}}^{(L-2)}\right),$	(5)
	$\displaystyle\vdots$	(6)
$\displaystyle x^{(2)}_{i_{2}}$	$\displaystyle=\sum_{i_{1}=1}^{n_{1}}\phi_{1,i_{2},i_{1}}\!\left(x_{i_{1}}^{(1)}\right),$	(7)
$\displaystyle x^{(1)}_{i_{1}}$	$\displaystyle=\sum_{i_{0}=1}^{n_{0}}\phi_{0,i_{1},i_{0}}(x_{i_{0}})\,,$	(8)

which is called KAN. The schematic of KAN is shown in Fig. 2.

The multivariate function $f({\bm{x}})$ is represented by a superposition of non-linear “activation” functions $\phi_{l,qp}(x)$ . To construct function $f({\bm{x}})$ , a shape of the non-linear functions $\phi_{l,qp}(x)$ is optimized. For example, in the implementation by Liu et al., each KAN layer is composed of the sum of the spline function and the SiLU activation function. There are various kinds of basis functions, such as Legendre polynomials, Chebyshev polynomials, and Gaussian radial distribution functions [21, 22, 23, 24].

III Potential energy in molecular dynamics

III.1 Various kinds of potential energies

In the field of molecular dynamics, accurately estimating the forces on each atom is crucial. Since the force on the $i$ -th atom at $\pmb{R}_{i}$ is derived from the derivative of the total potential energy $E$ , i.e.,

\pmb{F}_{i}=-\frac{\partial E}{\partial\pmb{R}_{i}}\,.

(9)

An accurate and fast estimation of the potential energy is one of the most important challenges.

Various kinds of potentials have been proposed for molecular dynamics. For example, the LJ potential is a simplified model that describes the essential features of interactions between simple atoms and molecules [25]. The EAM is an approximation describing the interatomic potential and is particularly appropriate for metallic systems [26].

Calculations based on DFT, known as ab initio calculations, are among the most accurate methods for evaluating potential energy. AIMD has many applications in material design. However, reducing the computational effort required for AIMD remains a key issue for its broader application to phenomena on large lengths and time scales.

The use of ANNs, which imitate DFT energies through machine learning, is seen as a promising solution to this issue [8]. Research in machine-learning molecular dynamics has grown rapidly over the last decade [14, 9, 12, 13, 15, 16, 17, 11]. Various network structures have been proposed, including graph neural networks [27, 28, 12, 13, 9, 10, 11, 12, 13, 14] and transformer architectures [15].

III.2 Relation between Kolmogorov–Arnold networks and potential energy

III.2.1 Lennard–Jones potential

Let us discuss the relation between KAN and potential energy in molecular dynamics. Using the LJ potential, the total potential energy for $N$ atoms is expressed as

	$\displaystyle E^{\mathrm{LJ}}(\pmb{R})$	$\displaystyle=\sum_{i=1}^{N}E_{i}^{\mathrm{LJ}}(\pmb{R})\,,$		(10)
	$\displaystyle E_{i}^{\mathrm{LJ}}(\pmb{R})$	$\displaystyle=\frac{1}{2}\sum_{j=1,j\neq i}^{N}V_{\mathrm{LJ}}(R_{ij})\,,$		(11)

where, with empirically determined coefficients $A_{m}$ and $B_{n}$ ,

$\displaystyle V_{\mathrm{LJ}}(R)$	$\displaystyle=\frac{A_{m}}{R^{m}}-\frac{B_{n}}{R^{n}}\,,$	(12)
$\displaystyle R_{ij}$	$\displaystyle=\|\pmb{R}_{ij}\|\,,$	(13)
$\displaystyle\pmb{R}_{ij}$	$\displaystyle=\pmb{R}_{j}-\pmb{R}_{i}\,.$	(14)

The LJ potential can be represented by a type of KAN, as shown in Fig. 3.

As shown in the KAN shown in Eq. (3), a non-linear function depends on indices of both input and output layers $i_{L-1}$ and $i_{L}$ . In the LJ potential model, the elements of the initial input are distances between two atoms, which should have a permutation symmetry. Therefore, all non-linear functions are $V_{\mathrm{LJ}}(R)$ , similar to weight sharing in MLP networks. We call this feature “function sharing.”

III.2.2 Embedded atom model potential

The total potential energy in the EAM is divided into two contributions, namely a pairwise part and a local density part:

$\displaystyle E^{\mathrm{EAM}}(\pmb{R})$	$\displaystyle=\sum_{i=1}^{N}E_{i}^{\mathrm{EAM}}({\bm{R}})\,,$	(15)
$\displaystyle E_{i}^{\mathrm{EAM}}({\bm{R}})$	$\displaystyle=F_{t_{i}}\!\left(n_{i}(\pmb{R})\right)+\frac{1}{2}\sum_{j=1,j\neq i}^{N}\phi_{t_{i}t_{j}}(R_{ij})\,,$	(16)
$\displaystyle n_{i}(\pmb{R})$	$\displaystyle=\sum_{j=1,j\neq i}^{N}\rho_{t_{j}}(R_{ij})\,.$	(17)

The functions $F_{\alpha}(n)$ , $\phi_{\alpha\beta}(R)$ , and $\rho_{\alpha}(R)$ are constructed empirically or by fitting the potential energy calculated by DFT calculations. In actual simulations, the value of these functions is evaluated by linear interpolation. In terms of KAN, the local potential energy $E_{i}^{\rm EAM}$ can be rewritten as

$\displaystyle E_{i}^{\mathrm{EAM}}(\pmb{R})$	$\displaystyle=F_{t_{i}}\!\left(n_{i}(\pmb{R})\right)+I\left(m_{i}(\pmb{R})\right)\,,$	(18)
$\displaystyle I(x)$	$\displaystyle=x\,,$	(19)
$\displaystyle m_{i}(\pmb{R})$	$\displaystyle=\sum_{j=1,j\neq i}^{N}\frac{1}{2}\phi_{t_{i}t_{j}}(R_{ij})\,,$	(20)

where $t_{i}$ is the element type of the $i$ -th atom. The EAM potential can be represented by a type of KAN, as shown in Fig. 4.

In terms of KAN, the EAM potential can be extended by adding more hidden layers and/or units.

III.2.3 Aarificial neural network potential

There are various kinds of ANN potentials. In this paper, we focus on Behler–Parrinello type neural networks [8]. In their formulation, the potential energy is assumed to be constructed from a sum of fictitious atomic energies, similar to the LJ and EAM potentials. Furthermore, the atomic energy is considered a functional of the atomic environment around the central atom, defined inside a sphere with a cutoff radius $R_{\mathrm{c}}$ . The feature values of the atomic environment, called descriptors, were introduced to ensure the translational and rotational invariance of the atomic energy. Therefore, the total and atomic energies are given as

\displaystyle E^{\mathrm{ANN}}(\pmb{R})

\displaystyle=\sum_{i=1}^{N}E_{t_{i}}^{\mathrm{ANN}}\left(\pmb{d}_{i}(\pmb{R};{R_{\mathrm{c}}})\right),

(21)

where $E^{\mathrm{ANN}}$ , $E_{t_{i}}^{\mathrm{ANN}}$ , and $\pmb{d}_{i}$ are the total energy, the atomic energy for the $i$ -th atom, and descriptor of the atomic environment for the centered $i$ -th atom, respectively. There are many kinds of descriptors. For example, in the case of the Chebyshev descriptor [11], the descriptor is determined as

$\displaystyle\pmb{d}_{i}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=\begin{pmatrix}d_{i;0}(\pmb{R};R_{\mathrm{c}})\\ \vdots\\ d_{i;N_{0}}(\pmb{R};R_{\mathrm{c}})\end{pmatrix}$
	$\displaystyle=\begin{pmatrix}\pmb{d}_{i}^{(\mathrm{R})}(\pmb{R};R_{\mathrm{c}})\\ \pmb{d}_{i}^{(\mathrm{A})}(\pmb{R};R_{\mathrm{c}})\end{pmatrix},$	(22)
$\displaystyle\pmb{d}_{i}^{(\mathrm{R})}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=\begin{pmatrix}\pmb{d}_{i}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}})\\ \pmb{d}_{i}^{(\mathrm{r,w})}(\pmb{R};R_{\mathrm{c}})\end{pmatrix},$
	$\displaystyle=\begin{pmatrix}(d_{i;0}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{r})}}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\\ (d_{i;0}^{(\mathrm{r,w})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{r})}}^{(\mathrm{r,w})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\end{pmatrix},$	(23)
$\displaystyle\pmb{d}_{i}^{(\mathrm{A})}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=\begin{pmatrix}\pmb{d}_{i}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}})\\ \pmb{d}_{i}^{(\mathrm{a,w})}(\pmb{R};R_{\mathrm{c}})\end{pmatrix}$
	$\displaystyle=\begin{pmatrix}(d_{i;0}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{a})}}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\\ (d_{i;0}^{(\mathrm{a,w})}(\pmb{R};R_{\mathrm{c}}),\cdots,d_{i;N^{(\mathrm{a})}}^{(\mathrm{a,w})}(\pmb{R};R_{\mathrm{c}}))^{\rm T}\end{pmatrix}$	(24)

$\displaystyle d_{i;s}^{(\mathrm{r})}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=\sum_{j=1,j\neq i}^{N}T_{s}(\tilde{R}(\pmb{R}_{ij};R_{\mathrm{c}}))f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}}),$	(25)
$\displaystyle d_{i;s}^{(\mathrm{r},\mathrm{w})}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=\sum_{j=1,j\neq i}^{N}T_{s}(\tilde{R}(\pmb{R}_{ij};R_{\mathrm{c}}))f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})w_{t_{j}},$	(26)
$\displaystyle d_{i;s}^{(\mathrm{a})}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}T_{s}\!\left(c(\pmb{R}_{ij},\pmb{R}_{ik})\right)$
	$\displaystyle\qquad\qquad\quad{}\times f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})f_{\mathrm{c}}(\pmb{R}_{ik};R_{\mathrm{c}}),$	(27)
$\displaystyle d_{i;s}^{(\mathrm{a},\mathrm{w})}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}T_{s}\!\left(c(\pmb{R}_{ij},\pmb{R}_{ik})\right)$
	$\displaystyle\qquad\qquad{}\times f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})f_{\mathrm{c}}(\pmb{R}_{ik};R_{\mathrm{c}})w_{t_{j}}w_{t_{k}},$	(28)

where

$\displaystyle\bar{R}(\pmb{R};R_{\mathrm{c}})$	$\displaystyle=2\frac{\|\pmb{R}\|}{R_{\mathrm{c}}}-1,$	(29)
$\displaystyle c(\pmb{R},\pmb{R}^{\prime})$	$\displaystyle=\frac{\pmb{R}\cdot\pmb{R}^{\prime}}{\|\pmb{R}\|\|\pmb{R}^{\prime}\|},$	(30)
$\displaystyle N_{0}$	$\displaystyle=2N^{(\mathrm{r})}+2N^{(\mathrm{a})}+3,$	(31)

$T_{s}(x)$ is the $s$ -th Chebyshev polynomial of the first kind, and $f_{\mathrm{c}}(\pmb{R};R_{\mathrm{c}})$ is a cutoff function that smoothly goes to zero at $R_{\mathrm{c}}$ . For example, the following function is used for many cases:

f_{\mathrm{c}}(\pmb{R};R_{\mathrm{c}})=\begin{cases}\displaystyle\frac{1}{2}\left[\cos\!\left(\pi\frac{|\pmb{R}|}{R_{\mathrm{c}}}\right)+1\right]&\displaystyle(0\leq|\pmb{R}|\leq R_{\mathrm{c}})\\ \displaystyle 0&\displaystyle(R_{\mathrm{c}}<|\pmb{R}|)\end{cases}.

(32)

We introduce the shorthand notations as

$\displaystyle\pmb{d}^{(\cdot)}_{i}$	$\displaystyle=\pmb{d}^{(\cdot)}_{i}(\pmb{R},R_{\mathrm{c}})\,,$	(33)
$\displaystyle d^{(\cdot)}_{i;s}$	$\displaystyle=d^{(\cdot)}_{i;s}(\pmb{R},R_{\mathrm{c}})\,,$	(34)
$\displaystyle\bar{R}_{ij}$	$\displaystyle=\bar{R}(\pmb{R}_{ij};R_{\mathrm{c}})\,,$	(35)
$\displaystyle c_{ijk}$	$\displaystyle=c(\pmb{R}_{ij},\pmb{R}_{ik})\,,$	(36)
$\displaystyle\bar{f}_{\mathrm{c}}(R_{ij})$	$\displaystyle=f_{\mathrm{c}}(\pmb{R}_{ij};R_{\mathrm{c}})\,.$	(37)

When one considers MLP with two hidden layers, the $i$ -th atomic energy is given as

	$\displaystyle E_{t_{i}}^{\mathrm{ANN}}(\pmb{d}_{i})=\sum_{\ell_{2}=1}^{N_{2}}W^{(3)}_{t_{i};1\ell_{2}}x^{(2)}_{i;\ell_{2}}+b_{t_{i}}^{(3)}\,,$		(38)
	$\displaystyle x^{(2)}_{i;\ell_{2}}=\sigma^{(2)}\!\!\left(\sum_{\ell_{1}=1}^{N_{1}}W^{(2)}_{t_{i};\ell_{2}\ell_{1}}x^{(1)}_{i;\ell_{1}}+b_{t_{i};\ell_{2}}^{(2)}\right),$		(39)
	$\displaystyle x^{(1)}_{i;\ell_{1}}=\sigma^{(1)}\!\!\left(\sum_{\ell_{0}=0}^{N_{0}}W^{(1)}_{t_{i};\ell_{1}\ell_{0}}\bar{d}_{i;\ell_{0}}(\pmb{R};{R_{\mathrm{c}}})+b_{t_{i};\ell_{1}}^{(1)}\right),$		(40)

where $\sigma^{(l)}(x)$ is an activation function of $l$ -th layer such as $\tanh(x)$ , and $\bar{\pmb{d}}_{i}$ is the normalized descriptor, of which element is defined as

	$\displaystyle\,\,\,\quad\bar{d}_{i;\ell_{0}}$
	$\displaystyle=\frac{d_{i;\ell_{0}}-m_{t_{i};\ell_{0}}}{\sigma_{t_{i};\ell_{0}}}$
	$\displaystyle=\begin{cases}\displaystyle\frac{d_{i;\ell_{0}}^{(\mathrm{R})}-m_{t_{i};\ell_{0}}^{(\mathrm{R})}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}&(\ell_{0}\leq 2N^{(\mathrm{r})}+1)\\ \displaystyle\frac{d_{i;\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}-m_{t_{i};\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}}{\sigma_{t_{i};\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}}&(2N^{(\mathrm{r})}+1<\ell_{0})\end{cases},$		(41)

where

	$\displaystyle m_{t_{i};\ell_{0}}$	$\displaystyle=\begin{cases}\displaystyle m_{t_{i};\ell_{0}}^{(\mathrm{R})}&(\ell_{0}\leq 2N^{(\mathrm{r})}+1)\\ \displaystyle m_{t_{i};\ell_{0}-2N^{(\mathrm{r})}-2}^{(\mathrm{A})}&(2N^{(\mathrm{r})}+1<\ell_{0})\end{cases},$		(42)
	$\displaystyle\sigma_{t_{i};\ell_{0}}$	$\displaystyle=\begin{cases}\displaystyle\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}&(\ell_{0}\leq 2N^{(\mathrm{r})}+1)\\ \displaystyle\sigma_{t_{i};\ell_{0}-N2^{(\mathrm{r})}-2}^{(\mathrm{A})}&(2N^{(\mathrm{r})}+1<\ell_{0})\end{cases},$		(43)

Here, $m_{i_{0}}$ and $\sigma_{i_{0}}$ are the mean value and standard deviation in a total data set. The ANN potential can be represented as a KAN, as shown in Fig. 5.

Let us discuss the relation between the KAN and ANN potential. We focus on the first layer shown in Eq. (40). This equation is rewritten as

\displaystyle x^{(1)}_{i;\ell_{1}}

\displaystyle=\sigma^{(1)}\!\left(z^{\mathrm{(r)}}_{i;\ell_{1}}+z^{\mathrm{(a)}}_{i;\ell_{1}}+b_{t_{i};\ell_{1}}^{(1)}\right),

(44)

where we define $z^{(\mathrm{r})}_{i;\ell_{1}}$ and $z^{(\mathrm{a})}_{i;\ell_{1}}$ as

$\displaystyle z^{(\mathrm{r})}_{i;\ell_{1}}$	$\displaystyle=\sum_{\ell_{0}=0}^{2N^{(\mathrm{r})}+1}W^{(1)}_{t_{i};\ell_{1}\ell_{0}}\frac{d_{i;\ell_{0}}^{(\mathrm{R})}-m_{t_{i};\ell_{0}}^{(\mathrm{R})}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}$
	$\displaystyle=\sum_{\ell_{0}=0}^{2N^{(\mathrm{r})}+1}\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}}d_{i;\ell_{0}}^{(\mathrm{R})}+b_{t_{i};\ell_{1}}^{(\mathrm{R})}\,,$	(45)
$\displaystyle\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}}$	$\displaystyle=\frac{W^{(1)}_{t_{i};\ell_{1}\ell_{0}}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}\,,$	(46)
$\displaystyle b_{t_{i};\ell_{1}}^{(\mathrm{R})}$	$\displaystyle=-\sum_{\ell_{0}=0}^{N^{(\mathrm{r})}}W^{(1)}_{t_{i};\ell_{1}\ell_{0}}\frac{m_{t_{i};\ell_{0}}^{(\mathrm{R})}}{\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}}\,,$	(47)

and

$\displaystyle z^{(\mathrm{a})}_{i;\ell_{1}}$	$\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{2N^{(\mathrm{a})}+1}W^{(1)}_{t_{i};\ell_{1}\ell_{0}^{\prime}+2N^{(\mathrm{r})}+2}\frac{d_{i;\ell_{0}^{\prime}}^{(\mathrm{A})}-m_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}{\sigma_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}$
	$\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{2N^{(\mathrm{a})}+1}\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}}d_{i;\ell_{0}^{\prime}}^{(\mathrm{A})}+b_{t_{i};\ell_{1}}^{(\mathrm{A})}\,,$	(48)
$\displaystyle\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}}$	$\displaystyle=\frac{W^{(1)}_{t_{i};\ell_{1}\ell_{0}^{\prime}+2N^{(\mathrm{r})}+2}}{\sigma_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}\,,$	(49)
$\displaystyle b_{t_{i};\ell_{1}}^{(\mathrm{A})}$	$\displaystyle=-\sum_{\ell_{0}^{\prime}=0}^{2N^{(\mathrm{a})}+1}W^{(1)}_{t_{i};\ell_{1}\ell_{0}^{\prime}+2N^{(\mathrm{r})}+2}\frac{m_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}{\sigma_{t_{i};\ell_{0}^{\prime}}^{(\mathrm{A})}}\,,$	(50)

respectively. By substituting Eqs. (26) and (28) into Eq. (45), $z^{(\mathrm{r})}_{i_{1}}$ and $z^{(\mathrm{a})}_{i_{1}}$ are given as

	$\displaystyle z^{\mathrm{(r)}}_{i;\ell_{1}}$	$\displaystyle=\sum_{j=1,j\neq i}^{N}\Phi_{t_{i},t_{j};\ell_{1}}^{\mathrm{(r)}}(\bar{R}_{ij})\,\bar{f}_{\mathrm{c}}(R_{ij})+b_{t_{i};\ell_{1}}^{\mathrm{(R)}},$		(51)
	$\displaystyle z^{\mathrm{(a)}}_{\ell_{1}}$	$\displaystyle=\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}\Phi_{t_{i},t_{j},t_{k};\ell_{1}}^{\mathrm{(a)}}(c_{ijk})\bar{f}_{\mathrm{c}}(R_{ij})\bar{f}_{\mathrm{c}}(R_{ik})+b_{t_{i};\ell_{1}}^{\mathrm{(A)}},$		(52)

where

$\displaystyle\Phi_{t_{i},t_{j};\ell_{1}}^{\mathrm{(r)}}(R)$	$\displaystyle=\phi_{t_{i};\ell_{1}}^{\mathrm{(r)}}(R_{ij})+\phi_{t_{i};\ell_{1}}^{\mathrm{(r,w)}}(R_{ij})w_{t_{j}},$	(53)
$\displaystyle\Phi_{t_{i},t_{j},t_{k};\ell_{1}}^{\mathrm{(a)}}(\cos\theta)$	$\displaystyle=\phi_{t_{i};\ell_{1}}^{\mathrm{(a)}}(\cos\theta)+\phi_{t_{i};\ell_{1}}^{\mathrm{(a,w)}}(\cos\theta)w_{t_{j}}w_{t_{k}},$	(54)
$\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(r)}}(R)$	$\displaystyle=\sum_{\ell_{0}=0}^{N^{\mathrm{(r)}}}\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}}T_{\ell_{0}}(R),$	(55)
$\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(r,w)}}(R)$	$\displaystyle=\sum_{\ell_{0}=0}^{N^{\mathrm{(r)}}}\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{1}\ell_{0}+N^{\mathrm{(r)}}+1}T_{\ell_{0}}(R),$	(56)
$\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(a)}}(\cos\theta)$	$\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{N^{\mathrm{(a)}}}\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}}T_{\ell_{0}^{\prime}}(\cos\theta),$	(57)
$\displaystyle\phi_{t_{i};\ell_{1}}^{\mathrm{(a,w)}}(\cos\theta)$	$\displaystyle=\sum_{\ell_{0}^{\prime}=0}^{N^{\mathrm{(a)}}}\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{1}\ell_{0}^{\prime}+N^{\mathrm{(a)}}+1}T_{\ell_{0}^{\prime}}(\cos\theta).$	(58)

In terms of KAN, the one-dimensional non-linear functions $\phi_{t_{i};\ell_{1}}^{\mathrm{(r)}}(R)$ , $\phi_{t_{i};\ell_{1}}^{\mathrm{(r,w)}}(R_{ij})$ , $\phi_{t_{i};\ell_{1}}^{\mathrm{(a)}}(\cos\theta)$ , and $\phi_{t_{i};\ell_{1}}^{\mathrm{(a,w)}}(\cos\theta)$ are regarded as activation functions in the KAN structure. The above equations show that these non-linear functions are expanded by the Chebyshev polynomials.

IV Kolmogorov–Arnold Network descriptor

IV.1 Definition

We propose a trainable descriptor called “KAN descriptor” for interatomic potentials, defined as

\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})=\begin{pmatrix}\tilde{z}_{i;1}(\pmb{R};R_{\mathrm{c}})\\ \tilde{z}_{i;2}(\pmb{R};R_{\mathrm{c}})\\ \vdots\\ \tilde{z}_{i:\tilde{N}_{0}}(\pmb{R};R_{\mathrm{c}})\end{pmatrix},

(59)

where

	$\displaystyle\quad\,\,\tilde{z}_{i;\ell_{0}}(\pmb{R};R_{\mathrm{c}})$
	$\displaystyle=\sum_{j=1,j\neq i}^{N}\tilde{\Psi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij};R_{\rm c})$
	$\displaystyle\quad\,{}+\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}\tilde{\Psi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik};R_{\rm c})+\tilde{b}_{t_{i};\ell_{0}}\,,$		(60)
	$\displaystyle=\sum_{j=1,j\neq i}^{N}\tilde{\Phi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij})f_{\rm c}(\pmb{R}_{ij};R_{\rm c})$
	$\displaystyle\quad\,{}+\sum_{\begin{subarray}{c}j,k=1,j\neq k\\ j\neq i,k\neq i\end{subarray}}^{N}\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik})f_{\rm c}(\pmb{R}_{ij};R_{\rm c})f_{\rm c}(\pmb{R}_{ik};R_{\rm c})$
	$\displaystyle\quad\,{}+\tilde{b}_{t_{i};\ell_{0}}.$		(61)

Here, $\tilde{\Phi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij})$ and $\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik})$ are radial (two-body) and angular (three-body) non-linear functions, which should be optimized.

The total energy is expressed as

\displaystyle E^{\mathrm{ANN}}(\pmb{R})

\displaystyle=\sum_{i=1}^{N}\tilde{E}^{\mathrm{ANN}}_{t_{i}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})),

(62)

where $\tilde{E}^{\mathrm{ANN}}_{t_{i}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}}))$ is the $i$ -th atomic potential constructed by neural networks with the KAN descriptor $\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})$ as shown in Fig. 6. It should be noted that both the function $\tilde{E}^{\mathrm{ANN}}_{t_{i}}$ and the KAN descriptor $\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})$ are trainable. When one considers MLP with one hidden layer, the $i$ -th atomic energy is expressed as

$\displaystyle\tilde{E}^{\mathrm{ANN}}_{t_{i}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}}))$	$\displaystyle=\sum_{\ell_{1}=1}^{\tilde{N}_{1}}\tilde{W}^{(2)}_{t_{i};\ell_{1}}\tilde{x}^{(1)}_{i;\ell_{1}}+\tilde{b}^{(2)}_{t_{i}}\,,$	(63)
$\displaystyle\tilde{x}^{(1)}_{i;\ell_{1}}$	$\displaystyle=\tilde{\sigma}^{(1)}\!\left(\sum_{\ell_{0}=1}^{\tilde{N}_{0}}\tilde{W}^{(1)}_{t_{i};\ell_{1}\ell_{0}}\tilde{x}^{(0)}_{i;\ell_{0}}+\tilde{b}^{(1)}_{t_{i};\ell_{1}}\right),$	(64)
$\displaystyle\tilde{x}^{(0)}_{i;\ell_{0}}$	$\displaystyle=\tilde{\sigma}^{(0)}\!\left(\tilde{z}_{i;\ell_{0}}(\pmb{R};R_{\mathrm{c}})\right).$	(65)

The network architecture with KAN descriptors is shown in Fig. 7.

In the original KAN layer shown in Eq. (8), non-linear functions $\phi_{l,i_{l},i_{l-1}}(x)$ depend on indices of input and output neurons. However, in the KAN descriptor, the non-linear functions $\tilde{\Phi}^{(\mathrm{r})}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij})$ and $\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik})$ do not depend on an index of input neurons and only on the kinds of atoms and an index of output neurons since atoms with the same types are indistinguishable. In Fig. 6, it looks there are $N_{i}$ distinct functions $\tilde{\Phi}^{\mathrm{(r)}}_{t_{i}t_{j};\ell_{0}}$ and $N_{i}^{2}$ distinct functions $\tilde{\Phi}^{\rm(a)}_{t_{i}t_{j}t_{k};\ell_{0}}$ for $\ell_{0}$ . However, the number of non-linear functions of the KAN descriptor is usually much smaller than $N_{i}$ or $N_{i}^{2}$ and depends on the number of the kind of atoms. For example, consider 10 water molecules surrounding the $i$ -th hydrogen atom in an H₂O system. In this case, only two $\tilde{\Phi}^{\mathrm{(r)}}$ functions ( $\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HH};\ell_{0}}$ and $\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HO};\ell_{0}}$ ) and three $\tilde{\Phi}^{\mathrm{(a)}}$ functions ( $\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HHH};\ell_{0}}$ , $\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HHO};\ell_{0}}$ , and $\tilde{\Phi}^{\mathrm{(r)}}_{\mathrm{HOO};\ell_{0}}$ ) are required although $N_{i}=30$ .

These non-linear functions should be optimized to approximate DFT energy. The non-linear functions can be expanded using B-spline functions, Gaussian radial distribution functions, or Chebyshev polynomials. During the training procedure, the coefficients of the non-linear functions are adjusted. Once accurate non-linear functions with appropriate coefficients are obtained, values can be evaluated using linear or spline interpolation. In this study, we employ the Chebyshev polynomials as

$\displaystyle\tilde{\Phi}^{\mathrm{(r)}}_{t_{i}t_{j};\ell_{0}}(\pmb{R}_{ij})$	$\displaystyle=\tilde{\varphi}^{\mathrm{(r,0)}}_{t_{i};\ell_{0}}(\bar{R}_{ij})+\tilde{\varphi}^{\mathrm{(r,1)}}_{t_{i};\ell_{0}}(\bar{R}_{ij})w_{t_{j}},$	(66)
$\displaystyle\tilde{\varphi}^{(\mathrm{r},n)}_{t_{i};\ell_{0}}(R)$	$\displaystyle=\sum_{s=0}^{\tilde{N}^{\mathrm{(r)}}}\tilde{W}^{(\mathrm{r},n)}_{t_{i};\ell_{0}s}\,T_{s}(R),$	(67)
$\displaystyle\tilde{\Phi}^{(\mathrm{a})}_{t_{i}t_{j}t_{k};\ell_{0}}(\pmb{R}_{ij},\pmb{R}_{ik})$	$\displaystyle=\tilde{\varphi}^{\mathrm{(a,0)}}_{t_{i};\ell_{0}}(c_{ijk})+\tilde{\varphi}^{\mathrm{(a,0)}}_{t_{i};\ell_{0}}(c_{ijk})w_{t_{j}}w_{t_{k}},$	(68)
$\displaystyle\tilde{\varphi}^{(\mathrm{a},n)}_{t_{i};\ell_{0}}(\cos\theta)$	$\displaystyle=\sum_{s=0}^{\tilde{N}^{\mathrm{(a)}}}\tilde{W}^{(\mathrm{a},n)}_{t_{i};\ell_{0}s}\,T_{s}(\cos\theta).$	(69)

We mention the representation ability of the non-linear functions. The number of the trainable parameters is $2\tilde{N}_{0}(\tilde{N}^{\mathrm{(r)}}+1)+2\tilde{N}_{0}(\tilde{N}^{\mathrm{(a)}}+1)$ for the KAN descriptor $\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}})$ . In addition, these non-linear functions depend on the kinds of atoms as same as the original Chebyshev descriptor [11]. Therefore, the KAN descriptor can avoid dimensional explosion with respect to the number of atom kinds like the symmetry functions [8].

IV.2 Relation between the Chebyshev and the Kolmogorov–Arnold network descriptors

The relations between the coefficients of the non-linear functions of the KAN descriptor and the coefficients defined in Eqs. (44)–(45) are expressed as

$\displaystyle\tilde{b}_{t_{i};\ell_{0}}$	$\displaystyle=b_{t_{i};\ell_{0}}^{(1)}+b_{t_{i};\ell_{0}}^{(\mathrm{R})}+b_{t_{i};\ell_{0}}^{(\mathrm{A})},$	(70)
$\displaystyle\tilde{W}^{(\mathrm{r},0)}_{t_{i};\ell_{0}n}$	$\displaystyle=\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{0}n},$	(71)
$\displaystyle\tilde{W}^{(\mathrm{r},1)}_{t_{i};\ell_{0}n}$	$\displaystyle=\bar{W}^{(1,\mathrm{R})}_{t_{i};\ell_{0}n+\tilde{N}^{(r)}+1},$	(72)
$\displaystyle\tilde{W}^{(\mathrm{a},0)}_{t_{i};\ell_{0}n}$	$\displaystyle=\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{0}n},$	(73)
$\displaystyle\tilde{W}^{(\mathrm{a},1)}_{t_{i};\ell_{0}n}$	$\displaystyle=\bar{W}^{(1,\mathrm{A})}_{t_{i};\ell_{0}n+\tilde{N}^{(a)}+1}.$	(74)

In the previous Chebyshev descriptor, $\sigma_{t_{i};\ell_{0}}^{(\mathrm{R})}$ , $\sigma_{t_{i};\ell_{0}}^{(\mathrm{A})}$ , $m_{t_{i};\ell_{0}}^{(\mathrm{R})}$ and $m_{t_{i};\ell_{0}}^{(\mathrm{A})}$ are calculated by a data set. In the KAN descriptor, these variables can be trainable. We note that the MLP with two-hidden layers with the Chebyshev descriptor can be regarded as the MLP with one-hidden layer with the KAN descriptor, since the KAN descriptor is trainable.

V Results

V.1 Target system and method

We consider iron with vacancies and dislocations. We use the training dataset available on GitHub as described in Ref. [29]. The KAN descriptor and MLP are implemented using the machine-learning package Flux.jl version 0.14.11, written in the Julia language[30, 31]. The sizes of the total dataset, training dataset, and test dataset are 6348, 5713, and 635, respectively. The energy of each atomic structure is calculated using Quantum Espresso, a first-principles calculation tool based on density functional theory [32]. The details of the calculation are provided in Ref. [29]. We adopt an MLP with one hidden layer for $\tilde{E}_{i}^{\mathrm{ANN}}(\tilde{\pmb{z}}_{i}(\pmb{R};R_{\mathrm{c}}))$ . The dimension of the input vector $\tilde{\pmb{z}}$ and the number of units in the hidden layer were set as $\tilde{N}_{0}=10$ and $\tilde{N}_{1}=10$ , respectively. To train the ANN potential, we use the limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm implemented in the Optim.jl package[33]. For each optimization with the limited-memory BFGS, we perform 10 iterations to reduce the loss function defined as

\displaystyle{\rm MSE}

\displaystyle=\frac{1}{N_{\rm batch}}\sum_{k=1}^{N_{\rm batch}}\left(E^{\mathrm{FP}(k)}-E^{{\rm ANN}(k)}\right)^{2}.

(75)

Here, the batch size $N_{\rm batch}$ in the Optim.jl package is 100. $E^{\mathrm{FP}(k)}$ and $E^{\rm ANN}(k)$ are the potential energies calculated by first-principles calculation and ANN, respectively.

V.2 Shape of the non-linear functions in KAN descriptors

Let us discuss the shape of the non-linear functions in KAN descriptors. We consider two parameter sets ( $\tilde{N}^{\mathrm{(r)}},\tilde{N}^{\mathrm{(a)}})=(20,20)$ and $(50,20)$ . As shown in Fig. 8, the shape of the non-linear functions in KAN descriptors $\tilde{\Phi}^{(\mathrm{r})}_{\mathrm{FeFe};\ell_{0}}(\pmb{R}_{ij})f_{\rm c}(\pmb{R}_{ij};R_{\rm c})$ depends on $\tilde{N}^{\mathrm{(r)}}$ .

In the region $R_{ij}<2$ [Å], the curves strongly depend on $\tilde{N}^{\mathrm{(r)}}$ . It should be noted that there is no data with $R_{ij}<2$ [Å] in the training and test data set. As shown in Fig. 9, the distribution of inter-atomic distance $R_{ij}$ is not uniform. In the region where no data is found in the training data set, the shape of the non-linear function in the KAN descriptor can not affect the quality of the ANN potential.

In terms of KANs, if the non-linear functions are well optimized with a fixed number of functions, the loss converges with increasing the number of the Chebyshev polynomials. As shown in Fig. 10, the limits of the expressive power of the KAN descriptors are reached with increasing the number of the Chebyshev polynomials for radial non-linear functions.

V.3 Fast evaluation of the Kolmogorov–Arnold network descriptors

We propose a method of a fast evaluation of the ANN potential with KAN descriptors. The conventional ANN potential constructed by the Chebyshev polynomial descriptor can be regarded as the ANN potential with KAN descriptors. The computational cost to evaluate the KAN descriptor is ${\cal O}(\tilde{N}^{\mathrm{(r)}})+{\cal O}(\tilde{N}^{\mathrm{(a)}})$ . With the use of linear or spline interpolation with $N$ points, the computational cost becomes ${\cal O}(\log N)$ which does not depend on the order of Chebyshev polynomials. We note that the computational cost of the forces can also be reduced with the use of interpolations.

We discuss the interpolation point dependence of the computational time and accuracy. In the top panel in Fig. 11, we show the residual norm of the KAN descriptors defined as

\displaystyle\frac{\left\|\tilde{\pmb{z}}-\tilde{\pmb{z}}^{\mathrm{interpolation}}\right\|}{\|\tilde{\pmb{z}}\|},

(76)

where $\tilde{\pmb{z}}^{\rm interpolation}$ is the input vector constructed by interpolations, and $\|\pmb{A}\|=\pmb{A}^{2}$ . We adopt a linear interpolation and natural cubic spline interpolation implemented in Interpolation.jl. The elapsed time is measured in MacBook Pro (14 inch, 2023) with Apple M2 Max processor. The parameters of the Chebyshev polynomial is $(\tilde{N}^{\mathrm{(r)}},\tilde{N}^{\mathrm{(a)}})=(50,20)$ . We should note that the code that we made written in Julia language might not be fully optimized. Therefore, we can discuss the elapsed time only qualitatively. As shown in Fig. 11, the elapsed time of each interpolation weakly depends on $\log N$ , which is consistent with a theoretical value ${\cal O}(\log N)$ . We also discuss the mean absolute loss determined as

\displaystyle{\rm MAE}

\displaystyle=\frac{1}{M}\sum_{k=1}^{M}\left|E^{(k)}-E^{{\rm ANN}(k)}\right|\!,

(77)

where $M$ is the number of the test data set. To discuss the accuracy of the interpolated KAN descriptors, we calculate the difference between the MAE with the ANN potential with Chebyshev polynomials and that with the interpolated KAN descriptors. The MAE of the Chebyshev descriptors is $2.05796$ [meV/atom]. As shown in the bottom panel in Fig. 11, the difference reduces with increasing interpolation points.

We should note that, for actual machine-learning MD simulations, it is more better to implement the KAN descriptor to an open-source ANN potential package such as ænet [11]. Since the implementation might not be complicated, we will propose the software in the future.

VI Conclusion

In this study, we have integrated KAN into MD simulations to enhance the efficiency of potential energy models. Our investigation reveals that the KAN framework can represent commonly used potentials, such as the LJ, EAM, and ANN potentials. This reinterpretation allows for the application of KAN’s non-linear functions to introduce the KAN descriptor for ANN potentials. Our results demonstrate that by employing linear or cubic spline interpolations for these KAN functions, computational savings can be achieved without reducing accuracy. Future work will focus on further refining the KAN descriptors and integrating them into open-source ANN potential packages such as ænet.

Acknowledgements.

This work was partially supported by JSPS KAKENHI Grants Nos. 22H05111, 22H05114, 22K03539, and 23H03996. This work was also partially supported by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN)” and “High Performance Computing Infrastructure (HPCI)” in Japan (Project ID: jh240028 and jh240059). We would like to thank Dr. Mitsuhiro Itakura for the variable discussions. Additionally, we are grateful to Dr. Hideki Mori for providing information about his training dataset for iron with vacancies and dislocations.

References

Rahman and Stillinger [1971] A. Rahman and F. H. Stillinger, Molecular dynamics study of liquid water, J. Chem. Phys. 55, 3336 (1971).
Wang et al. [2010] S. Wang, S. Li, Z. Cao, and T. Yan, Molecular dynamic simulations of ionic liquids at graphite surface, J. Phys. Chem. C Nanomater. Interfaces 114, 990 (2010).
Kubicki and Lasaga [1988] J. Kubicki and A. Lasaga, Molecular dynamics simulations of SiO 2 melt and glass; ionic and covalent models, American Mineralogist 73, 941 (1988).
Engel et al. [2015] M. Engel, P. F. Damasceno, C. L. Phillips, and S. C. Glotzer, Computational self-assembly of a one-component icosahedral quasicrystal, Nat. Mater. 14, 109 (2015).
Kobayashi et al. [2023] K. Kobayashi, M. Okumura, H. Nakamura, M. Itakura, M. Machida, S. Urata, and K. Suzuya, Machine learning molecular dynamics reveals the structural origin of the first sharp diffraction peak in high-density silica glasses, Sci. Rep. 13, 18721 (2023).
Iftimie et al. [2005] R. Iftimie, P. Minary, and M. E. Tuckerman, Ab initio molecular dynamics: concepts, recent developments, and future trends, Proc. Natl. Acad. Sci. U. S. A. 102, 6654 (2005).
Mouvet et al. [2022] F. Mouvet, J. Villard, V. Bolnykh, and U. Rothlisberger, Recent advances in first-principles based molecular dynamics, Acc. Chem. Res. 55, 221 (2022).
Behler and Parrinello [2007] J. Behler and M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy surfaces, Phys. Rev. Lett. 98, 146401 (2007).
Nagai et al. [2020] Y. Nagai, M. Okumura, K. Kobayashi, and M. Shiga, Self-learning hybrid monte carlo: A first-principles approach, Phys. Rev. B Condens. Matter 102, 041124 (2020).
Nagai et al. [2024] Y. Nagai, Y. Iwasaki, K. Kitahara, Y. Takagiwa, K. Kimura, and M. Shiga, High-Temperature atomic diffusion and specific heat in quasicrystals, Phys. Rev. Lett. 132, 196301 (2024).
Artrith et al. [2017] N. Artrith, A. Urban, and G. Ceder, Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species, Phys. Rev. B 96, 014112 (2017).
Batzner et al. [2022] S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials, Nat. Commun. 13, 2453 (2022).
Musaelian et al. [2023] A. Musaelian, S. Batzner, A. Johansson, L. Sun, C. J. Owen, M. Kornbluth, and B. Kozinsky, Learning local equivariant representations for large-scale atomistic dynamics, Nat. Commun. 14, 579 (2023).
Bartók et al. [2010] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons, Phys. Rev. Lett. 104, 136403 (2010).
Thölke and De Fabritiis [2022] P. Thölke and G. De Fabritiis, TorchMD-NET: Equivariant transformers for neural network based molecular potentials, arXiv [cs.LG] (2022).
Sivaraman et al. [2020] G. Sivaraman, A. N. Krishnamoorthy, M. Baur, C. Holm, M. Stan, G. Csányi, C. Benmore, and Á. Vázquez-Mayagoitia, Machine-learned interatomic potentials by active learning: amorphous and liquid hafnium dioxide, Npj Comput. Mater. 6, 1 (2020).
Wang et al. [2024] Y. Wang, T. Wang, S. Li, X. He, M. Li, Z. Wang, N. Zheng, B. Shao, and T.-Y. Liu, Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing, Nat. Commun. 15, 313 (2024).
Liu et al. [2024] Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljačić, T. Y. Hou, and M. Tegmark, KAN: Kolmogorov-arnold networks, arXiv [cs.LG] (2024).
Kolmogorov [1956] A. Kolmogorov, On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables, Proceedings of the USSR Academy of Sciences 108, 179 (1956).
Braun and Griebel [2009] J. Braun and M. Griebel, On a constructive proof of kolmogorov’s superposition theorem, Constr. Approx. 30, 653 (2009).
Bhattacharjee [2024] S. S. Bhattacharjee, Torchkan: Simplified kan model with variations, https://github.com/1ssb/torchkan/ (2024).
Nagai [2024] Y. Nagai, Fluxkan: Julia version of the torchkan, https://github.com/cometscome/FluxKAN.jl (2024).
Ss et al. [2024] S. Ss, K. Ar, G. R, and A. Kp, Chebyshev polynomial-based kolmogorov-arnold networks: An efficient architecture for nonlinear function approximation, arXiv [cs.LG] (2024).
Li [2024] Z. Li, Kolmogorov-arnold networks are radial basis function networks, arXiv [cs.LG] (2024).
Schwerdtfeger and Wales [2024] P. Schwerdtfeger and D. J. Wales, 100 years of the lennard-jones potential, J. Chem. Theory Comput. 20, 3379 (2024).
Daw and Baskes [1984] M. S. Daw and M. I. Baskes, Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals, Phys. Rev. B 29, 6443 (1984).
Takamoto et al. [2022] S. Takamoto, C. Shinagawa, D. Motoki, K. Nakago, W. Li, I. Kurata, T. Watanabe, Y. Yayama, H. Iriguchi, Y. Asano, T. Onodera, T. Ishii, T. Kudo, H. Ono, R. Sawada, R. Ishitani, M. Ong, T. Yamaguchi, T. Kataoka, A. Hayashi, N. Charoenphakdee, and T. Ibuka, Towards universal neural network potential for material discovery applicable to arbitrary combination of 45 elements, Nat. Commun. 13, 2991 (2022).
Batatia et al. [2022] I. Batatia, D. P. Kovács, G. N. C. Simm, C. Ortner, and G. Csányi, MACE: Higher order equivariant message passing neural networks for fast and accurate force fields, arXiv [stat.ML] (2022).
Mori et al. [2023] H. Mori, T. Tsuru, M. Okumura, D. Matsunaka, Y. Shiihara, and M. Itakura, Dynamic interaction between dislocations and obstacles in bcc iron based on atomic potentials derived using neural networks, Phys. Rev. Mater. 7, 063605 (2023).
Innes et al. [2018] M. Innes, E. Saba, K. Fischer, D. Gandhi, M. C. Rudilosso, N. M. Joy, T. Karmali, A. Pal, and V. Shah, Fashionable modelling with flux, CoRR abs/1811.01457 (2018), arXiv:1811.01457 .
Innes [2018] M. Innes, Flux: Elegant machine learning with julia, Journal of Open Source Software 10.21105/joss.00602 (2018).
Giannozzi et al. [2009] P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo, A. Dal Corso, S. de Gironcoli, S. Fabris, G. Fratesi, R. Gebauer, U. Gerstmann, C. Gougoussis, A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari, F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smogunov, P. Umari, and R. M. Wentzcovitch, QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials, J. Phys. Condens. Matter 21, 395502 (2009).
Mogensen and Riseth [2018] P. K. Mogensen and A. N. Riseth, Optim: A mathematical optimization package for Julia, Journal of Open Source Software 3, 615 (2018).