LAMMPS Implementation of Rapid Artificial Neural Network Derived Interatomic Potentials

D.Dickel^1,2, M. Nitol², C. D. Barrett^1,2

Abstract

While machine learning approaches have been successfully used to represent interatomic potentials, their speed has typically lagged behind conventional formalisms. This is often due to the complexity of the structural fingerprints used to describe the local atomic environment and the large cutoff radii and neighbor lists used in the calculation of these fingerprints. Even recent machine learned methods are at least 10 times slower than traditional formalisms. An implementation of a rapid artificial neural network (RANN) style potential in the LAMMPS molecular dynamics package is presented here which utilizes angular screening to reduce computational complexity without reducing accuracy. For the smallest neural network architectures, this formalism rivals the modified embedded atom method (MEAM) for speed and accuracy, while the networks approximately one third as fast as MEAM were capable of reproducing the training database with chemical accuracy. The numerical accuracy of the LAMMPS implementation is assessed by verifying conservation of energy and agreement between calculated forces and pressures and the observed derivatives of the energy as well as by assessing the stability of the potential in dynamic simulation. The potential style is tested using a force field for magnesium and the computational efficiency for a variety of architectures is compared to a traditional potential models as well as alternative ANN formalisms. The predictive accuracy is found to rival that of slower methods.

Keywords. Machine Learning; Artificial Neural Networks; Molecular Dynamics;

1 Introduction

Artificial neural networks (ANNs) and other machine learning techniques are gaining prominence as a method for developing classical interatomic potentials for use in molecular statics (MS) and molecular dynamics (MD) simulation [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Because of their ability to accurately fit large and complex data sets and because they can be rapidly optimized, ANNs have shown the potential to predict atomic energies and forces more accurately than most existing classical formalisms [6, 7]. By using density functional theory (DFT), which is relatively expensive computationally, to produce large training databases for the ANN, potentials which reproduce this data to high precision at much faster computational speeds can be created. While only a limited number of large scale dynamic simulations have been performed using these potentials, static simulations for a variety of materials have been able to reproduce ab initio forces and energies with chemical accuracy [11, 2, 12, 5].
ANNs used to model interatomic potentials behave similarly to traditional formalisms with an energy being defined for each individual atom based on the local atomic arrangement and the atomic species of it and its neighbors, mapping the local environment directly to numerical values for the energy and forces. This encoding of the local environment is referred to as the structural fingerprint. Thompson et al. developed the Spectral Neighbor Analysis Method (SNAP) [13] which uses machine-learning techniques on components of the local neighbor density. Behler and Parrinello [14, 15] defined a set of functions based on Gaussians which could be used to generate the fingerprint. This fingerprint formalism has since been used by a number of authors as the basis for an ANN. The celebrated Gaussian Approximation Potentials (GAP)[16] have been used to develop potentials with chemical accuracy for a variety of systems including tungsten [17], lithium-carbon[18], iron[19] and water[20]. Artrith and Urban used the Behler-Parrinello basis functions to create a potential for titanium dioxide using an ANN with two hidden layers with 10 neurons each which accurately described the various cystal phases of TiO₂[2]. Artrith, Urban, and Ceder have also recently proposed an efficient scheme for the generation of machine-learned potentials for composite materials with many atomic species[21]. Takahahsi et al [11] used a combination of the Gaussian formalism along with insights from the modified embedded atom method (MEAM) [22] to generate a fingerprint of several thousand values which was then used to fit a potential for titanium using linear ridge regression. Recently, a method using the embedding functions from MEAM has been used to generate potentials for titanium and zinc using a single hidden layer [23, 24]. Singraber, Behler, and Dellago [25] recently presented the n2p2 neural network package which uses the Behler symmetry functions in the construction of the fingerprint. This package has been used to develop a number of potentials including for Al-Cu [26] and Mg [27].
While the optimal structural fingerprint and network architecture for an ANN is still unclear and indeed is probably strongly dependent on the material system being examined, there is a need to be able to perform large scale dynamic calculations with these potentials. Additionally, while machine learned potentials greatly outperform ab initio calculations in terms of efficiency, they still lag behind conventional formalisms such as the embedded atom method (EAM)[28] and MEAM, which have been the standard for analysis of metals since their introduction over 25 years ago. Additionally, MEAM takes advantage of angular screening to limit the size of the neighbor list and incorporate the known shielding effect to improve performance. To this end, we have created an implementation of a rapid artificial neural network (RANN) potential formalism for use in LAMMPS [29]. This implementation accepts arbitrary Multi-layer Perceptron (MLP) architectures and structural fingerprints of arbitrary size. Sample potentials have been generated from the MEAM based structural fingerprints with angular screening, though novel functions or fingerprint types can be added independently. Section 2 of this paper reviews the formalism for ANN interatomic potentials and provides the specific structural fingerprints implemented including a discussion of the use of angular screening. Section 3 describes the framework which has been implemented in LAMMPS. Section 4 demonstrates a number of simple static and dynamic calculations using an ANN potential. In particular, conservation of energy is shown along with accurate calculation of the forces and pressure, and a comparison is done between the computational efficiency of this potential style as compared to other common styles for metal systems. It is demonstrated that, for the network architectures capable of reproducing a magnesium training database to within 1meV/atom, performance approximately one third as MEAM can be achieved. Additionally, the performance of the potential as a function of neural network architecture and neighbor list size is considered.

2 Artificial Neural Network Architecture

Most machine-learned interatomic potentials make use of MLPs to predict the energy of individual atoms in a system given their environment. Details of the training and optimization of these networks in reference to ab initio data can be found elsewhere [2, 23, 30], but the general structure of a network as an interatomic potential is as follows. The energy of a particular atom $i$ , determined by its environment, is the last of N layers of the neural network. The values for any particular layer, $\mathbf{A}^{n}$ , after the first is determined by the previous layer and the weight and bias matrices $\mathbf{W}^{n}$ and $\mathbf{B}^{n}$ :

	$\displaystyle Z^{n}_{l_{n}}=\sum_{l_{n-1}}{W^{n}_{l_{n}l_{n-1}}A^{n-1}_{l_{n-1}}+B^{n}_{l_{n}}}$		(1)
	$\displaystyle A^{n}_{l_{n}}=g^{n}(Z^{n}_{l_{n}})$		(2)

Where $l_{n}$ is the number of neurons in layer $n$ and $g^{n}(x)$ is a nonlinear activation function. The input layer, $\mathbf{A}^{0}$ is given by the structural fingerprint of the local atomic environment. The ouput layer, $\mathbf{Z}^{N}$ , will always contain a single node, representing the energy, and so $\mathbf{W}^{N}$ will always be a row vector and $\mathbf{B}^{N}$ always a single number.
A number of different functions can be used to describe this structural fingerprint. Behler and Parrinello [15] consider two types of fingerprint functions: radial symmetry functions which are a pairwise sum of Gaussians, and angular terms over all triplets of atoms. For the RANN style, however, we use the fingerprint style introduced by Dickel, Francis, and Barrett, motivated by the Modified Embedded Atom Method (MEAM) formalism [22, 31, 32], with the addition of angular screening. This style seems particularly relevant for metal potentials for the same physically motivated reasons that the MEAM formalism is effective. In this style, two different kind of input fingerprints are considered. First, simple pair interactions are considered and summed over all the neighbors of a given atom. For a given atom labeled $i$ , we define a set of pair potentials interactions with the form

F_{n}=\sum_{j\neq i}{(\frac{r_{ij}}{r_{e}})^{n}e^{-\alpha_{n}\frac{r_{ij}}{r_{e}}}f_{c}(\frac{r_{c}-r_{ij}}{\Delta r})S_{ij}}

(3)

Where $j$ labels all the neighbors atoms of $i$ within a cutoff radius $r_{c}$ , $n$ is an integer, different for each member of the pairwise contributions to the fingerprint, $r_{e}$ is the equilibrium nearest neighbor distance, $S_{ij}$ is an angular screening term described below, and $\alpha_{n}$ are metaparameters, which can be tuned to better optimize the potential. In principle, $\alpha_{n}$ are related to the dimensionless $\alpha$ used in the MEAM formalism which is related to the bulk modulus of the material [32]. $f_{c}(x)$ is a cutoff function which smoothly varies the weight of the interaction from 1 for atoms inside the cutoff radius to 0 once $r_{ij}>r_{c}$ where $\Delta r$ defines the width of the transition. The cutoff function used here is the same as employed in MEAM and is given by

r_{c}(x)=\left\{\begin{array}[]{lr}1,&x>1\\ (1-(1-x)^{4})^{2},&0\leq x\leq 1\\ 0,&x<0\end{array}\right\}

(4)

The second kind of fingerprint function considers three body terms, with a form similar to the partial electron densities used in MEAM. There are two parameters, $m$ and $k$ which can be varied to generate more inputs for the fingerprint:

G_{m,k}=\sum_{j,k}{\cos^{m}{\theta_{jik}}e^{-\beta_{k}\frac{r_{ij}+r_{ik}}{r_{e}}}f_{c}(\frac{r_{c}-r_{ij}}{\Delta r})f_{c}(\frac{r_{c}-r_{ik}}{\Delta r})S_{ij}S_{ik}}

(5)

where $\theta_{jik}$ is the angle between $r_{ij}$ and $r_{ik}$ , m is a non-negative interger and $\beta_{k}$ is a set of metaparameters which determine the length scale of the different terms. For large numbers of neighbors, it can be more efficient to calculate this as a sum over a single list of neighbors as follows.

	$\displaystyle G_{0,k}=\left(\sum_{j}{e^{-\beta_{k}\frac{r_{ij}}{r_{e}}}f_{c}(r_{ij}})S_{ij}\right)^{2}$		(6)
	$\displaystyle G_{1,k}=\sum_{\alpha_{1}}\left(\sum_{j}\frac{r^{\alpha_{1}}_{i,j}}{r_{i,j}}e^{-\beta_{k}\frac{r_{ij}}{r_{e}}}f_{c}(\frac{r_{c}-r_{ij}}{\Delta r})S_{ij}\right)^{2}$		(7)
	$\displaystyle G_{2,k}=\sum_{\alpha_{1},\alpha_{2}}\left(\sum_{j}\frac{r^{\alpha_{1}}_{i,j}r^{\alpha_{2}}_{i,j}}{r_{i,j}^{2}}e^{-\beta_{k}\frac{r_{ij}}{r_{e}}}f_{c}(\frac{r_{c}-r_{ij}}{\Delta r})S_{ij}\right)^{2}$		(8)
	$\displaystyle G_{3,k}=\sum_{\alpha_{1},\alpha_{2},\alpha_{3}}\left(\sum_{j}\frac{r^{\alpha_{1}}_{i,j}r^{\alpha_{2}}_{i,j}r^{\alpha_{3}}_{i,j}}{r_{i,j}^{3}}e^{-\beta_{k}\frac{r_{ij}}{r_{e}}}f_{c}(\frac{r_{c}-r_{ij}}{\Delta r})S_{ij}\right)^{2}$		(9)
	$\displaystyle G_{m,k}=\sum_{\alpha_{1}}\sum_{\alpha_{2}}...\sum_{\alpha_{m}}\left(\sum_{j}\frac{r^{\alpha_{1}}_{i,j}r^{\alpha_{2}}_{i,j}...r^{\alpha_{m}}_{i,j}}{r_{i,j}^{m}}e^{-\beta_{k}\frac{r_{ij}}{r_{e}}}f_{c}(\frac{r_{c}-r_{ij}}{\Delta r}))S_{ij}\right)^{2}$		(10)

where $r^{\alpha_{m}}_{ij}$ is the $x$ , $y$ , or $z$ component of $r_{ij}$ for $\alpha_{m}=1,2$ or $3$ , respectively. A simplification can be made by noting that many of the terms in the three body interactions are redundant. In $G_{2,k}$ for example, there will be a calculation for xy as well as for yx. By taking advantage of this redundancy, we can reduce the number of terms that need to be calculated for $G_{m,k}$ from $3^{n}$ to $\frac{(n+1)(n+2)}{2}$ . The simplified expression for $G_{m,k}$ now reads:

G_{m,k}=\sum_{\alpha_{1}\geq\alpha_{2}\geq...\geq\alpha_{m}}\left(\frac{n!}{n_{x}!n_{y}!n_{z}!}\sum_{j}{\frac{r^{\alpha_{1}}_{i,j}r^{\alpha_{2}}_{i,j}...r^{\alpha_{m}}_{i,j}}{r_{i,j}^{m}}e^{-\beta_{k}\frac{r_{ij}}{r_{e}}}f_{c}(\frac{r_{c}-r_{ij}}{\Delta r})}\right)^{2}

(11)

where $n_{x}$ , $n_{y}$ , and $n_{z}$ are the total number of x-, y-, and z-components in the set $(\alpha_{1},\alpha_{2},...,\alpha_{m})$ , respectively. Note that which of these forms (Eq. 5 or Eq.11 will be more efficiently calculated will depend on the length of the neighbor list and the magnitude of $m$ . Structural fingerprints of arbitrary size can be constructed by varying the number of different $n$ , $m$ , and $k$ used giving more or less information to the ANN. For the potentials tested here, we have used consecutive values for n and m and totaling $n_{tot}$ and $m_{tot}$ respectively ( $n\in(-1,0,1,...,n_{tot}-2)$ , $m\in(0,1,...,m_{tot}-1)$ ). A particular ANN potential will consist of a fixed structural fingerprint, number and length of each hidden layer, weight and bias matrices for each layer and activation functions for each layer. For the potentials considered here we have used the following activation functions:

	$\displaystyle g^{N}(x)=x$		(12)
	$\displaystyle g^{n}(x)=\frac{x}{10}+\frac{9}{10}\log(e^{x}+1)\;\textrm{for}\;n<N$		(13)

The size of the weight and bias matrices will depend on the length of the fingerprint and the length of each hidden layer.
As discussed above, one of the major computational barriers in the implementation of ANN methods is the large neighbor list included in the calculation of the structural fingerprint. In order to obtain good convergence with DFT results, cutoff radii used can be larger than 12 Å[27], which for the ground state of Mg includes over 300 atoms. By comparison, recent MEAM potentials for Mg [33] have used a cutoff distance of 5.875 Å, which includes only 38 atoms in the ground state. Since the computation time scales at least linearly with the number of neighbors, minimizing the length of the neighbor list without sacrificing performance should be a key goal for optimal efficiency. The radial screening function $f_{c}$ can accomplish this, but leads to nonphysical results at large separations (see, for example [33]). Angular screening, whereby the effective interaction between atoms is reduced or eliminated by the presence of an atom located between them can effectively limit the neighbor list in a similar way without the unphysical results. Such a method has been utilized by MEAM [34], and the same screening method has been employed here. Briefly, the screening between two atoms is determined from the product of all the screening interactions by other atoms in the neighborhood:

S_{ij}=\prod_{k\neq i,j}S_{ikj}

(14)

where $S_{ikj}$ is calculated from a geometric construction considering the ellipse formed by the 3 atoms with $r_{i,j}$ one of the axes. The screening parameter, $C_{ikj}$ is then given by:

C_{ikj}=1+2\frac{r^{2}_{ij}r^{2}_{ik}+r^{2}_{ij}r^{2}_{jk}-r^{4}_{ij}}{r^{4}_{ij}-(r^{2}_{ik}-r^{2}_{jk})^{2}}

(15)

and then screening value is

S_{ikj}=f_{c}\left(\frac{C_{ikj}-C_{min}}{C_{max}-C_{min}}\right)

(16)

where $f_{c}$ is the same cutoff function used for the radial cutoff and $C_{max}$ and $C_{min}$ are metaparameters which can be tuned to determine which neighbors can be excluded from calculations.
The effect of including angular screening can be demonstrating by considering the change of an individual fingerprint as the length scale is changed continuously. For this example, we consider the value of a single fingerprint for a Mg atom in a perfect bulk hcp lattice as the lattice parameter is changed continuously. In the absence of angular screening, the value of the fingerprint will change rapidly at particular values of the lattice constant as new neighbors enter the radial screening distance. If angular screening is included with metaparameters such that, for example, only 3rd nearest neighbors are ever included regardless of lattice parameter, the change in the value is considerably smoother. Figure 1 demonstrates the difference between the two cases. Not only is the unscreened value more expensive to calculate as it requires more neighbors for most cases, it can also be seen that the predictive power of the model is hampered as the relation between two states which are physically similar – ideal 3D lattices with different lattice constants, given markedly different trends and values as the number of neighbors changes. This can be overcome through the action of the neural network and precise cancellation among different fingerprint terms, but the quality of fit and the extrapolative power of the screened functions appears more natural. Indeed unphysical deviations at relatively moderate isotropic compressions, as seen for example in [27] can be avoided entirely through angular screening as the fingerprint values change continuously in the screened case even at high compression.

Refer to caption — Figure 1: A sample value for a fingerprint in the presence (orange) and absence (blue)of angular screening as the volume of a cell is changed isotropically. Because the neighbor list frequently changes without angular screening, the curve is seen to be less smooth, requiring more data to provide predictive information to the ANN.

For the full network, forces are calculated by taking a gradient of the energy over the entire system. This includes the individual ANN contributions from every atom in the system, so the force on atom $i$ is determined not only by the ANN output of that atom, but of all of its neighbors as well.

3 Numerical Implementation

The ANN potential file written for LAMMPS includes all of the details of the neural network. It begins with a header which lists the types and numbers of fingerprints to be used. While we have only included the two varieties, the bond power and radial power terms, $F_{n}$ and $G_{m,k}$ , in this work, the LAMMPS potential style can be expanded in a straightforward way to include other descriptors. For each of the types used, the potential file then specifies the number used and any metaparameters. For $F_{n}$ this includes the range of values for $n$ as well as the metaparameters $\alpha_{n}$ , $r_{c}$ , $C_{max}$ , $C_{min}$ and $r_{e}$ . For $G_{m,k}$ , $m_{tot}$ and $k_{tot}$ are given along with the metaparameters $\beta_{k}$ , $r_{e}$ , $r_{c}$ , $C_{min}$ , and $C_{max}$ . Note that while we use the same cutoff radius and angular screening parameters for both varieties, they can in principle be different as every fingerprint type has its own set of metaparameters. Next, the network architecture is specified by giving the total number of layers and the length of each layer. Figure 2 shows a sample header for the potential file. Following the header is the weight and bias matrices for each layer and finally the activation functions used for each layer. Using the potential file, the ANN subroutine then calculates the feature list for every atom and uses these to calculate the energy and force for every atom in the usual way. The subroutine and potential file also have the capability of assigning multiple ANNs to different atoms types to be used for multi-element simulations. In this way, the framework can be used for arbitrary structural fingerprints as inputs to arbitrary MLP architectures for systems with an arbitrary number of element types.
The subroutine is included as a normal pair-style in LAMMPS called ‘rann’ and can be called in the usual way:

pair_style rann
pair_coeff * * potential_file.nn Element1 Element2 …

4 Validation and Benchmarking

With the pair style implemented in LAMMPS, several potentials were fit for a reference database for magnesium (Mg). The database, containing over 300,000 atomic environments was constructed using the open-source density functional theory (DFT) software, Quantum Espresso [35]. The configurations used to train the networks are shown in Table 1. These configurations included simple triaxial strain perturbations of the low energy SC, FCC, BCC, and HCP phases of Mg along with larger supercells with atoms displaced from equilibrium to mimic the effect of thermal excitation. Free surface data and vacancy data, with atoms also perturbed from equilibrium positions were also included to improve stability. Effective temperatures up to 1000K were included in the thermally excited database. Training of the network was carried out by minimizing the RMSE of the system energies compared to the DFT results using the Levenerg-Marquadt method [36, 37]. 10% of the training data from each configuration type was left out of training and used for validation. The most significant variation however, came in the size and construction of the fingerprint used as the input layer. Along with the number and type of fingerprints included, the most important factor in determining accuracy and efficiency was the number of atoms included in the neighbor list. This is a function of the cutoff radius $r_{c}$ and the angular screening parameter $C_{min}$ . Table 2 summarizes the different fingerprints considered and the RMSE for both the training and validation datasets for each potential. As can be seen, even the smallest architectures had RMSE of less than 3 meV/atom and the best were less than 1 meV/atom. Table 2 also includes computational performance for a sample calculation which will be described in detail below.

Table 1: DFT simulations used to generate training and validation data for the Al potential. Training database attempts to consider all relevant low energy structures which may be encountered during MD simulations.

Sample Description	Atoms per simulations	Number of Simulations	Total atomic environments
SC cubic cell w/ strains up to $\pm$ 15%	1	500	500
FCC primitive cell w/ strains up to $\pm$ 15%	1	500	500
BCC primitive cell w/ strains up to $\pm$ 15%	1	500	500
HCP unit cell w/ strains up to $\pm$ 10%	2	2700	5400
FCC 2x2x2 orthogonal supercell	32	500	16000
BCC 3x3x3 orthogonal supercell	54	500	27000
HCP 3x3x3 primitive supercell	54	2000	108000
HCP 4x3x3 primitive supercell with vacancy	67	100	6700
HCP 0001 free surface	54	500	27000
HCP $10\overline{1}0$ free surface	54	100	5400
Totals		8100	307000

Table 2: Input Fingerprints considered in this work.

k_{tot}

and

m_{tot}

are the total number of values uses for

k

and

m

in Eq. 11. For the pair-style input,

n_{tot}

was always 5. All ANNs contained a single hidden layer of 20 neurons. #SF is the total number of structural fingerprints for a particular potential and

r_{c}

and

C_{m}in

are the radial cutoff and angular screening parameter that determine the number of neighbors considered in calculation of the fingerprints. #NN is the number of neighbors included in a calculation of bulk HCP Mg at the equilibrium lattice constant. RMSE values are for the full training and validation databases and are given in meV/atom. Calculation speed is given in timesteps/second. The starred entry was used for validation tests.

$k_{tot}$	$m_{tot}$	#SF	$r_{c}$	$C_{min}$	#NN	Training RMSE	Validation RMSE	Calculation speed
3	4	13	6	0.49	20	0.90	2.25	21.77
3	4	13	8	0.49	20	0.67	1.64	15.74
3	4	13	10	0.49	20	0.69	1.71	10.71
2	4	13	12	0.25	38	0.89	1.71	5.56
2	4	13	12	0.49	20	1.09	1.05	8.26
2	4	13	12	0.70	18	0.93	1.08	9.34
3	4	17	12	0.25	38	0.80	1.03	5.07
3	4	17	12	0.49	20	0.71	1.38	8.04
3	4	17	12	0.70	18	0.80	1.06	8.75
5	3	20	12	0.25	38	1.57	2.95	4.91
5	3	20	12	0.49	20	2.80	3.08	7.88
5	3	20	12	0.70	18	2.88	3.10	8.32
MEAM [33]								61.963
N2P2 [27]								5.12

4.1 Validation of the potential

Tests of the validity of the potential and its implementation in LAMMPS were all performed using the 13 member input fingerprint with a single hidden layer containing 20 neurons. The metaparameters for this potential are shown in Table 3. $C_{min}$ was chosen exclude atoms beyond the third nearest neighbors in the equilibrium hcp Mg structure. First, to confirm agreement with first principles results beyond the RMSE in energy, the bulk properties of hcp magnesium were tested for this potential. Table 4 shows a comparison of these bulk properties among the sample ANN, the first principles database, and experimental results. Agreement can be seen on the simple bulk properties targeted.

Table 3: Metaparameters of the benchmark potential used for validation.

$n_{tot}$	5
$k_{tot}$	3
$m_{tot}$	4
$\alpha$	5.52
$\beta_{k}$	1,2,9
$r_{c}$	6 $\AA$
$C_{min}$	0.49
$C_{max}$	2.90

As an initial test of the fidelity of the implementation, a number of simple structures were generated and tested to confirm that the forces and energies were correct and in agreement with one another. As a simple test, we displace a single atom from its equilibrium location in a minimized HCP Mg lattice. Figure 3 shows the force on the displaced atom as a function of displacement as predicted by the ANN and by numerical differentiation of the energy. Agreement is seen between the two measures demonstrating that forces are correctly predicted by the algorithm.

Table 4: Properties of bulk magnesium as determined by experiment, the DFT database used for the fitting of the potential, and the 13-20-1 neural network used in this study.

Property	Experiment	DFT	ANN
$a$ (nm)	3.209 [38]	3.194	3.196
$c$ (nm)	5.211 [38]	5.184	5.189
$c/a$ (nm)	1.623	1.623	1.623
$E_{c}$ (eV)	1.51 [38]	1.45	1.46
$C_{11}$ (GPa)	63.5 [39]	61.6	63.4
$C_{12}$ (GPa)	25.9 [39]	23.8	24.3
$C_{13}$ (GPa)	21.7 [39]	21.2	18.5
$C_{33}$ (GPa)	66.5 [39]	64.3	62.9
$C_{44}$ (GPa)	18.4 [39]	17.3	18.4

As a second test, a periodic fcc cubic unit cell was generated and allowed to relax following a conjugate gradient minimization scheme. Symmetry is maintained by the structures throughout minimization. The simulated cell is then uniaxially compressed 10% in the x direction and slowly pulled in tension to a total positive strain of 10% and the changes in energy and stress tensor are examined. The results are shown in Figure 4. We see, again, that the numerical derivative of the energy as a function of strain agrees exactly with the stress calculated by the algorithm.
Having demonstrated that the pair style correctly reproduces the energy, forces, and stresses in static configurations, we now confirm that dynamic behavior can also be correctly produced. A dynamic simulation was carried out with a 256 atom periodic hcp system (4x4x4 orthogonal unit cells). The system was initialized from the equilibrium ground state with atoms given a random velocity distribution producing an average temperature of 300K. The system was then allowed to evolve at constant energy and volume. Figure 5 shows the evolution of the kinetic, potential and total energy of the system. We see that total energy is conserved.

4.2 Computational Efficiency

While ANN potentials have the capability to reproduce experimental or first principles results with greater accuracy than traditional formalisms, if the computational performance suffers too greatly, the benefits become negligible. The speed at which the ANN pair style will perform depends on the size and complexity of the structural fingerprint, the number of neighbors within the cutoff radius and the network architecture.
As a benchmark of the performance, we compare the performance of the Mg ANN potential to an magnesium Modified Embedded Atom potential[33] and a recent n2p2 style neural network potential [27]. To test the speed of the MEAM potential and the various NN potentials, a sample system of hcp magnesium containing 4000 atoms was minimized and each atom given a random velocity matching a distribution with a temperature of 300K. The systems was then allowed to evolve at constant energy and volume for 1000 timesteps. The number of timesteps calculated per second is shown in Table 2. Calculations were performed on one core of a Intel Xeon E5-2690 2.0GHz processor. We see that all of the potentials are slower than MEAM, with a strong dependence on the number of neighbors considered and on the cutoff radius $r_{c}$ , but the fastest potential, used in the validation study, runs approximately 1/3 as fast as MEAM and four times faster than competitive n2p2 potentials. As can be seen the first columns of Table 2, there is a strong dependence on the cutoff radius even when the same number of atoms are included due to the angular screening parameter. This is due to the computational time required to calculate the angular screening needed to remove atoms from the neighbor list. This is still seen to be more efficient than considering all the atoms within the larger cutoff radius.
For the size of networks considered here, the computation time is dominated by the calculation of the structural fingerprint with the propagation through the network only a limiting factor for the smallest fingerprints and largest architectures. This suggests that, assuming the structural fingerprint contains sufficient information to describe the relevant configurations, it is more efficient to increase the size of the network to increase the accuracy of the potential, using this formalism, rather than expanding the fingerprint. This is particularly true for extending the length of the cutoff radius. While a large cutoff radius is often desirable and is used in existing ANN potentials as it should improve the smoothness of the potential as atoms move away from the target atom, the high computational cost of increasing the neighbor list suggests that ideally only first, second, and possibly third neighbors should normally be considered for metals. Angular screening combines both of these aspects, limiting the number of neighbors in bulk settings while still allowing relatively long-range interactions for studies involving voids, free surfaces, or fracture.
The RANN formalism presented here has the advantage of requiring similar computer time as MEAM to run simulations using the developed potentials and, thanks to the additional of angular screening, faster than other existing ANN formalisms, but the development time should be considerably smaller, as fitting to an arbitrary number of targets is automated with many free parameters available to tune to all available data.

5 Conclusions

We have presented an implementation of the Rapid Artificial Neural Network (RANN) interatomic potential pair style for use in the LAMMPS molecular dynamics code. The potential is suitable for static and dynamic calculations, conserving total energy and correctly calculating the pressure and forces of bulk solids. Such potential styles have been shown to be capable of reproducing training data from first principles calculations at a level which exceeds previous semi-empirical formalisms [6, 7, 10]. The developed pair style is capable of accommodating network architectures of arbitrary dimension and activation function, and the calculation of the structural fingerprint requires only singly looped summations over the neighbor list of a target atom, improving computational efficiency. Angular screening has also been introduced both to improve efficiency by limiting the number of included neighbors and to improve predictive power by introducing the physically motivated phenomenon of shielding and smoothing fingerprints as atoms move through the radial cutoff. Computational efficiency is of particular importance for this formalism as the improvements in accuracy over existing formalisms such as MEAM are greatly diminished if the runtime is closer to that of first principles calculations. As such, we demonstrate that for a network with 13 input fingerprints and a cutoff radius of 6.0 $\AA$ for the neighbor list, and angular screening restricting interactions to third nearest neighbors in bulk HCP the implementation performs at one third the speed of MEAM. Larger networks, fingerprints, or neighbor lists will hamper this performance, with the most significant cost associated with increasing the neighbor list through the length of the radial cutoff. Ultimately, a flexible, scalable formalism for ANNs, which can utilize the native parallel processing available in LAMMPS has been demonstrated with a formalism which produces high accuracy potentials which operate on speeds comparable to those of MEAM.

Acknowledgements

References

[1] Hongxiang Zong, Ghanshyam Pilania, Xiangdong Ding, Graeme J Ackland, and Turab Lookman. Developing an interatomic potential for martensitic phase transformations in zirconium by machine learning. npj Computational Materials, 4(1):48, 2018.
[2] N. Artrith and A. Urban. An implementation of artificial neural-network potentials for atomistic materials simulations: Performance for tio2. Computational Materials Science, 114:135–150, 2016.
[3] Gabriele C Sosso, Giacomo Miceli, Sebastiano Caravati, Jörg Behler, and Marco Bernasconi. Neural network interatomic potential for the phase change material gete. Physical Review B, 85(17):174103, 2012.
[4] Jörg Behler. Representing potential energy surfaces by high-dimensional neural network potentials. Journal of Physics: Condensed Matter, 26(18):183001, 2014.
[5] Daniele Dragoni, Thomas D Daff, Gábor Csányi, and Nicola Marzari. Achieving dft accuracy with a machine-learning interatomic potential: Thermomechanics and defects in bcc ferromagnetic iron. Physical Review Materials, 2(1):013808, 2018.
[6] Atsuto Seko, Akira Takahashi, and Isao Tanaka. First-principles interatomic potentials for ten elemental metals via compressed sensing. Physical Review B, 92(5):054113, 2015.
[7] Tran Doan Huan, Rohit Batra, James Chapman, Sridevi Krishnan, Lihua Chen, and Rampi Ramprasad. A universal strategy for the creation of machine learning-based atomistic force fields. NPJ Computational Materials, 3(1):37, 2017.
[8] Volker L Deringer and Gábor Csányi. Machine learning based interatomic potential for amorphous carbon. Physical Review B, 95(9):094203, 2017.
[9] Albert P Bartók, James Kermode, Noam Bernstein, and Gábor Csányi. Machine learning a general-purpose interatomic potential for silicon. Physical Review X, 8(4):041048, 2018.
[10] Ryo Kobayashi, Daniele Giofré, Till Junge, Michele Ceriotti, and William A Curtin. Neural network potential for al-mg-si alloys. Physical Review Materials, 1(5):053604, 2017.
[11] A. Takahashi, A. Seko, and I. Tanaka. Conceptual and practical bases for the high accuracy of machine learning interatomic potentials: Application to elemental titanium. Physical Review Materials, 1:063801, 2017.
[12] G. P. P. Pun and Y. Mishin. Optimized interatomic potential for silicon and its application to thermal stability of silicene. Physical Review B, 95(22):224103, 2017.
[13] Aidan P Thompson, Laura P Swiler, Christian R Trott, Stephen M Foiles, and Garritt J Tucker. Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials. Journal of Computational Physics, 285:316–330, 2015.
[14] Jörg Behler. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. The Journal of Chemical Physics, 134:074106, 2011.
[15] Jörg Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces. Physical review letters, 98(14):146401, 2007.
[16] Albert P Bartók, Mike C Payne, Risi Kondor, and Gábor Csányi. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Physical review letters, 104(13):136403, 2010.
[17] Wojciech J Szlachta, Albert P Bartók, and Gábor Csányi. Accuracy and transferability of gaussian approximation potential models for tungsten. Physical Review B, 90(10):104108, 2014.
[18] So Fujikake, Volker L Deringer, Tae Hoon Lee, Marcin Krynski, Stephen R Elliott, and Gábor Csányi. Gaussian approximation potential modeling of lithium intercalation in carbon nanostructures. The Journal of chemical physics, 148(24):241714, 2018.
[19] D. Dragoni, T. D. Daff, G. Csányi, and N. Marzari. Achieving dft accuracy with a machine-learning interatomic potential: Thermomechanics and defects in bcc ferromagnetic iron. Physical Review Materials, 2(1):013808, 1991.
[20] Thuong T Nguyen, Eszter Székely, Giulio Imbalzano, Jörg Behler, Gábor Csányi, Michele Ceriotti, Andreas W Götz, and Francesco Paesani. Comparison of permutationally invariant polynomials, neural networks, and gaussian approximation potentials in representing water interactions through many-body expansions. The Journal of chemical physics, 148(24):241725, 2018.
[21] Nongnuch Artrith, Alexander Urban, and Gerbrand Ceder. Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species. Physical Review B, 96(1):014112, 2017.
[22] M. I. Baskes. Modified embedded-atom potentials for cubic materials and impurities. Physical Review B, 46(5):2727, 1992.
[23] D Dickel, DK Francis, and CD Barrett. Neural network aided development of a semi-empirical interatomic potential for titanium. Computational Materials Science, 171:109157, 2020.
[24] Mashroor S Nitol, Doyl E Dickel, and Christopher D Barrett. Artificial neural network potential for pure zinc. Computational Materials Science, 188:110207, 2021.
[25] Andreas Singraber, Jörg Behler, and Christoph Dellago. Library-based lammps implementation of high-dimensional neural network potentials. Journal of chemical theory and computation, 15(3):1827–1840, 2019.
[26] Daniel Marchand, Abhinav Jain, Albert Glensk, and WA Curtin. Machine learning for metallurgy i. a neural-network potential for al-cu. Physical Review Materials, 4(10):103601, 2020.
[27] Markus Stricker, Binglun Yin, Eleanor Mak, and WA Curtin. Machine learning for metallurgy ii. a neural-network potential for magnesium. Physical Review Materials, 4(10):103602, 2020.
[28] M. S. Daw and Baskes M. I. Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals. Physical Review B, 29:6443, 1984.
[29] S. J. Plimpton. Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 117(1):1–19, 1995.
[30] Jörg Behler. Perspective: Machine learning potentials for atomistic simulations. The Journal of chemical physics, 145(17):170901, 2016.
[31] M. I. Baskes. Determination of modified embedded atom method parameters for nickel. Mater. Chem. Phys., 50:152, 1997.
[32] B.-J. Lee and M. I. Baskes. Second nearest-neighbor modified embedded atom method potential. Physical Review B, 62:8564, 2000.
[33] Z Wu, MF Francis, and WA Curtin. Magnesium interatomic potential for simulating plasticity and fracture phenomena. Modelling and Simulation in Materials Science and Engineering, 23(1):015004, 2015.
[34] Michael I Baskes. Determination of modified embedded atom method parameters for nickel. Materials Chemistry and Physics, 50(2):152–158, 1997.
[35] Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavazzoni, Davide Ceresoli, Guido L Chiarotti, Matteo Cococcioni, Ismaila Dabo, et al. Quantum espresso: a modular and open-source software project for quantum simulations of materials. Journal of physics: Condensed matter, 21(39):395502, 2009.
[36] Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2):164–168, jul 1944.
[37] Donald W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2):431–441, jun 1963.
[38] N Saunders, AP Miodownik, and AT Dinsdale. Metastable lattice stabilities for the elements. Calphad, 12(4):351–374, 1988.
[39] G Simmons and H Wang. Single crystal elastic constants and calculated aggregate properties: A handbook 2nd ed., 370, 1971.