This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A representation-independent electronic charge density database for crystalline materials

Jimmy-Xuan Shen1, Jason M. Munro2, Matthew K. Horton1,3,
Patrick Huck2, Shyam Dwaraknath3 Kristin A. Persson1,3,*
Abstract

In addition to being the core quantity in density functional theory, the charge density can be used in many tertiary analyses in materials sciences from bonding to assigning charge to specific atoms. The charge density is data-rich since it contains information about all the electrons in the system. With increasing utilization of machine-learning tools in materials sciences, a data-rich object like the charge density can be utilized in a wide range of applications. The database presented here provides a modern and user-friendly interface for a large and continuously updated collection of charge densities as part of the Materials Project. In addition to the charge density data, we provide the theory and code for changing the representation of the charge density which should enable more advanced machine-learning studies for the broader community.

1. Department of Materials Science and Engineering, University of California, Berkeley, Berkeley, California 94720, United States

2. Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, 94720, United States

3. Energy Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, 94720, United States

*corresponding author(s): Kristin A. Persson ([email protected])

Background & Summary

The application of Density Functional Theory (DFT) to many-electron systems has witnessed tremendous growth in the past few decades and has now become the de facto simulation tool for physicists, chemists, and materials scientists. The central concept of DFT is that the energy, and in turn all of the physical properties of a quantum system, are completely determined by the electronic charge density of the ground state ρ(𝐫)\rho(\mathbf{r}) with 𝐫\mathbf{r} the position vector 1. The majority of the computational cost in typical DFT calculations is associated with determining ρ\rho via an iterative algorithm to arrive at a self-consistent charge density 2. For the most commonly used exchange-correlation functionals, like the local density approximation (LDA) 2; 3 and the semi-local functional by Perdew–Burke-Ernzerhof (PBE) 4, a converged charge density can be used as the starting point for more expensive calculations such as obtaining a detailed bandstructure 5 or calculating the optical response of the material 6.

In addition to its central role in standard DFT calculations, the charge density itself is also useful for the analysis of many materials properties. The critical points of the charge density (i.e. where the gradient is zero) are often used as a boundary between atomic neighborhoods. In turn, this allows for a systematic assignment of charge to specific atoms 7; 8, as well as the determination of bonding character between neighboring pairs 9. Within the realm of energy materials, the charge density can be used as an effective potential to study the migration properties of Li in solid-state materials, as low charge density provides a metric of “free” space in a lattice 10; 11. Consequently, the local minima of the charge density can act as initial guesses for the positions of inserted cations 12.

A single DFT calculation of the primitive unit cell provides one representation of the charge density within that particular basis set. However, depending on the data application, alternative representations might be desired. An important example of this is in machine-learning (ML) algorithms where obtaining a consistent data representation is absolutely essential for deep-learning methods. However, the representation of charge density is heavily influenced by the simulation cell and the Bravais lattice of the periodic structure. Hence, a necessary step in using electronic charge densities in machine-learning applications is the ability to obtain alternative representations of the same physical density. While recent work has examined the effectiveness of representations in Fourier space 13, any ML investigation of local interaction (e.g. adsorption and intercallation of ions) requires flexible representations in real space. Towards that end, our database will provide code to obtain arbitrary real space representations of charge density for a given material directly from a DFT-computed charge density.

The charge density of any crystalline solid, and indeed any periodic field, is naturally represented in a plane-wave basis set, where the inherent periodicity of the system is embedded in the underlying representation. For a sufficiently converged finite plane-wave basis, the continuous charge density ρ(𝐫)\rho({\bf r}) and its Fourier transform ϕ(𝐤)\phi({\bf k}) can be accurately sampled by a three-dimensional array indexed by ii, jj, and kk with N1N_{1}, N2N_{2}, and N3N_{3} evenly spaced grid-points along each lattice vector, and can be converted from one to the other via a discrete Fourier transform

ρ(𝐫)(𝐚1,𝐚2,𝐚3,ρi,j,k)1(𝐛1,𝐛2,𝐛3,ϕi,j,k)ϕ(𝐤).\displaystyle\rho(\mathbf{r})\equiv(\mathbf{a}_{1},\mathbf{a}_{2},\mathbf{a}_{3},\rho_{i,j,k})\xLeftrightarrow[\hskip 10.00002pt\mathcal{F}^{-1}\hskip 10.00002pt]{\mathcal{F}}(\mathbf{b}_{1},\mathbf{b}_{2},\mathbf{b}_{3},\phi_{i,j,k})\equiv\phi(\mathbf{k})\,. (1)

where the 𝐚α\mathbf{a}_{\alpha} and 𝐛α\mathbf{b}_{\alpha} represent the real and reciprocal lattice vectors, and ii, jj, and kk are the indices of regularly-spaced grid points along the lattice vectors. Due to the discrete nature of numerical Fourier transforms, the number of grid points of a real-space representation is equal to the number of plane waves needed to represent the data in reciprocal space.

A representation of the charge density is uniquely determined by three vectors and a scalar matrix either in real or reciprocal space. Each representation only provides a “view” of the infinite periodic data represented in a specific unit cell and an infinite number of such representations exist for a given charge density. Regardless of the grid size and the periodic cell representation, the DFT-computed charge density represents the same underlying field, yet they are routinely recomputed when any change is needed in the representation, even when the computational parameters are unchanged, often at considerable expense. One common example is the use of the electrostatic potential of a super cell to correct for the periodic image effects of a charged defect 14.

Due to the significant amount of computational resources devoted to computing the electronic charge densities, as well as the growing domains of their application especially for the training of data-intensive machine learning models, there is a pressing need for a large-scale representation-independent database of charge densities. The Materials Project (https://materialsproject.org) - as a rapidly growing (currently more than 170,000 users) materials informatics resource - is a natural platform for the dissemination of such data. The work presented in this article provides details on how the charge densities in our database are computed and how they can be accessed. In addition, we provide a high-level API for querying and post-processing of the charge density data. Among other features, the API will allow users to take an existing atomic structure and query for charge density of the same material, in the representation/view of the user’s choosing.

Methods

In this section, we provide details on the scope of the charge densities database (CDD) and the precise set of computational parameters used to generate the data. Additionally, we will demonstrate features of the API that allow users to obtain arbitrary views of the charge density data including up-sampling/compressing the data via Fourier analysis and symmetry operations like translations, rotations, and super-cell transformations.

Calculation parameters

The charge densities are obtained from DFT calculations run using the static calculation workflow within the atomate software package 15, and relaxed input structures from the Materials Project (MP) database 16. The projector-augmented wave (PAW) method as implemented in the plane-wave Vienna Ab-initio Simulation Package (VASP) is used in conjunction with the PBE generalized-gradient approximation functional 4. The default set of MP calculation input parameters was used, which have been demonstrated to produce well-converged results 17. Included in these parameters is an energy cutoff of 520 eV, a total energy error threshold of 5× 1055\leavevmode\nobreak\ \times\leavevmode\nobreak\ 10^{-5} eV/atom, and a reciprocal kk-point density of 100/A3100/\mathrm{A}^{-3}. The only addition made to the input set is to enable aspherical contributions in the gradient corrections inside the PAW spheres. Hubbard UU-corrections are included with materials containing oxygen and fluorine. Elements Co, Cr, Fe, Mn, Mo, Ni, V, and We use values of 3.32, 3.70, 5.30, 3.90, 4.38, 6.20, 3.25, and 6.20 eV, respectively.

Changing the charge density representations

Given one representation (𝐚1,𝐚2,𝐚3,ρi,j,k)(\mathbf{a}_{1},\mathbf{a}_{2},\mathbf{a}_{3},\rho_{i,j,k}) of the charge density ρ\rho, we may transform it to any other representation (𝐚1,𝐚2,𝐚3,ρi,j,k)({\mathbf{a}}_{1}^{\prime},{\mathbf{a}}_{2}^{\prime},{\mathbf{a}}_{3}^{\prime},\rho_{i^{\prime},j^{\prime},k^{\prime}}) by interpolating the data. Due to computation time and data storage constraints, DFT codes will typically use the fewest grid points possible to represent the charge density which limits the effectiveness of local interpolation schemes. However, since our charge densities have periodic boundary conditions and are reasonably smooth (owing to the use of pseudo-potentials), the charge density can be represented in Fourier space. we can up-sample our data via Fourier interpolations 18 as shown in Figure 1. The procedure to perform Fourier interpolation of real space data is as follows:

  1. 1.

    Take the discrete Fourier transform of ρi,j,k\rho_{i,j,k}.

  2. 2.

    Augment the resulting Fourier data ϕi,j,k\phi_{i,j,k} with zero-valued higher frequency components.

  3. 3.

    Apply the reverse transformation to obtain the up-sampled data.

The augmented Fourier data is mathematically equivalent to the original Fourier data. Thus, the inverse transform of the augmented Fourier data must be equivalent to the original real space data sampled at a higher density. Increasing the grid density using Fourier interpolation enables us to up-sample ρi,j,k\rho_{i,j,k} in each direction by a factor of γup\gamma_{\rm up}. We may then resample the local grid with a linear interpolation scheme to ensure the fidelity of our data.

Given a primitive-cell representation of the charge density — (𝐚1,𝐚2,𝐚3,ρi,j,k)(\mathbf{a}_{1},\mathbf{a}_{2},\mathbf{a}_{3},\rho_{i,j,k}), any periodic representation of a scalar field f(𝐫)f(\mathbf{r}) can be understood as applying an arbitrary translation on the unit cell by a vector 𝐭\mathbf{t}:

T^𝐭f(𝐫)f(𝐫𝐭),\displaystyle\hat{T}_{\mathbf{t}}f(\mathbf{r})\equiv f(\mathbf{r}-\mathbf{t})\,, (2)

followed by a super-cell transformation P^\hat{P} defined as an integer matrix with det(P^)1\det(\hat{P})\geq 1 which acts on the lattice vectors from the right

(𝐚1𝐚2𝐚3)=(𝐚1𝐚2𝐚3)P^.\displaystyle(\mathbf{a}^{\prime}_{1}\,\mathbf{a}^{\prime}_{2}\,\mathbf{a}^{\prime}_{3})=(\mathbf{a}_{1}\,\mathbf{a}_{2}\,\mathbf{a}_{3})\hat{P}\,. (3)

Our software is capable of performing the same operations in arbitrary dimensions. As an example, in Figure 2, we show the results of re-griding in using a plane of the charge density in a two-atom Si unit cell which only cuts across a single atom at the origin, Figure 2 (a) shows the result of Fourier interpolating the field from a 12×1212\times 12 grid (large circles) onto a 48×4848\times 48 grid (smaller circles). In Figure 2 (b), the modified representation is obtained by first shifting the origin to the center of the cell at 𝐭=(𝐚1+𝐚2)/2\mathbf{t}=({\mathbf{a}}_{1}+{\mathbf{a}}_{2})/2 followed by a change of basis to 𝐚1=2𝐚1{\mathbf{a}}^{\prime}_{1}=2{\mathbf{a}}_{1} and 𝐚2=2𝐚2𝐚1{\mathbf{a}}^{\prime}_{2}=2{\mathbf{a}}_{2}-{\mathbf{a}}_{1}.

While integer-valued supercell transformations will yield an equivalent periodic charge density, non-integer basis transformations are used to obtain an arbitrary crop of periodic charge density sampled at any density. As an example, we show how a non-periodic cubic sample of the surface charge density can be obtained from the slab calculation in Figure 2 (c). The simulation was performed using a 7.73Å×3.87Å×21.88Å7.73\leavevmode\nobreak\ \mbox{\AA{}}\times 3.87\leavevmode\nobreak\ \mbox{\AA{}}\times 21.88\leavevmode\nobreak\ \mbox{\AA{}} orthorhombic Si slab cell and the charge density is stored on a 120×60×336120\times 60\times 336 grid. A 5Å×5Å×5Å5\leavevmode\nobreak\ \mbox{\AA{}}\times 5\leavevmode\nobreak\ \mbox{\AA{}}\times 5\leavevmode\nobreak\ \mbox{\AA{}} cropped region of the charge density sampled on a 48×48×4848\times 48\times 48 grid is indicated by the blue iso-surface in Figure 2 (c). It is important to note that the cropped cell can be greater in any dimension, as compared to the simulation cell. In the example, the smallest dimension of the simulation cell is 3.87 Å while the cropped cube has side lengths of 5 Å. This feature essentially allows us to robustly obtain the charge density in any preferred real-space dimensions, independent of the simulation cell parameters.

essentially allows us to freely choose the simulation cell in situations where periodic-image effects are not present.

Database details

We use a hybrid data model to serve the data: Queryable data such as chemical formula, total energy, and calculation parameters are served as JSON-like documents using MongoDB, while much larger and not-queryable charge density data is served using AWS S3 object storage 19. When a charge density is parsed from the output file to a serialized object, a unique Object ID is assigned and stored alongside the other data in the MongoDB database. From the user’s perspective, two subsequent API requests are needed. One to obtain calculation inputs and outputs from MongoDB, and another for Object ID and charge density data. A visual representation of the data flow is provided in Figure 3.

Code availability

The software used to access and transform the charge density data is accessible from the Materials Project API (https://github.com/materialsproject/api) and pyRho (https://github.com/materialsproject/pyRho) python package repositories on Github. See the Usage Notes section for more information.

Data Records

Raw charge density data output from DFT calculations can be obtained from the corresponding MP API endpoint: https://api.materialsproject.org/charge_density. Each entry can be referenced with a particular DOI through the associated MP material. Additionally, the input parameters for the specific calculation used to generate the entry can be obtained from the tasks endpoint at https://api.materialsproject.org/tasks. Details for how to interact with the referenced endpoints can be found in the Usage Notes section.

Technical Validation

We can elucidate the performance of the re-griding algorithm using a larger set of elemental polymorphs from the Materials Project. For this test set 𝒮el\mathcal{S}_{\rm el}, we selected 389 single-element structures from MP for which the energy above the convex hull was less than 100 meV and the number of atoms in the unit-cell was less than 20. For each structure in 𝒮el\mathcal{S}_{\rm el}, we perform VASP static calculations on the primitive unit cell and on a super-cell using

P^=(110110001).\displaystyle\hat{P}=\begin{pmatrix}1&1&0\\ 1&-1&0\\ 0&0&1\end{pmatrix}\,. (4)

For each charge density obtained using an explicit super-cell calculation, we obtain the average error compared to a super-cell charge density obtained from transforming the charge density. The results of the comparison are shown in Figure 4. We observe that using an up-sampling factor of 4, results in a periodic grid fine enough such that the pseudo-charge density, which typically ranges from 0 to 100 e/Åe^{-}/\mbox{\AA{}} near the atomic cores, only exhibits a difference of 0.001 e/Åe^{-}/\mbox{\AA{}}.

Usage Notes

To faciliate access to data, convenience functions have been implemented as part of the Materials Project API python client. These are contained within the MPRester class as part of the pymatgen software package (https://github.com/materialsproject/pymatgen). More specifically, two functions are provided to send independent requests to the API endpoints. These take as input the Materials Project ID associated with a given material in the database. The calculation input data from the tasks endpoint is then returned as a set of key-value pairs within a python dictionary, and the charge density data is de-serialized and returned as a pymatgen CHGCAR object. With the MPRester class imported, the following code workflow can be used.

# Obtain the CHGCAR object for a given calculation material ID
with MPRester(<API_KEY>) as mpr:
chgcar = mpr.get_chgcar_from_mpid("mp-149")
# To obtain the full list of inputs for the charge density calculation
with MPRester(<API_KEY>) as mpr:
chgcar, calc_inputs = mpr.get_chgcar_from_mpid("mp-149",
inc_inputs = True)

In order to alter the representation of the charge density obtained from the API endpoint, the pyRho python package (https://github.com/materialsproject/pyRho) can be used alongside the obtained pymatgen (https://github.com/materialsproject/pymatgen) CHGCAR object. Examples of how to re-grid, interpolate, and visualize are included in the repository as a set of Jupyter 20 notebooks.

Acknowledgements

This work was supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division under contract no. DE-AC02-05-CH11231 (Materials Project program KC23MP).

Author contributions

JXS developed the regridding analysis software; JXS and SD developed the back-end API and JMM front-end API; JMM performed the DFT calculations that produced the charge densities; JXS, JMM, MKH, PH, and SD also participated in aggregating, ingesting and maintaining the data at different stages. KAP was responsible for supervising and advising the project at all stages.

Competing interests

The authors declare no competing interests.

Figures

Refer to caption
Figure 1: Schematic of data transfer for Fourier interpolation and compression. The more densely sampled (larger) and the coarsely sampled (smaller) 3D blocks of real-space data can each be transformed to Fourier space, resulting in a Fourier representation of the same size. To up-sample the data, we use the smaller block in Fourier space, augment with zeros while keeping all the data fixed near the origin (at the corners of the cube). To compress, we crop the data in Fourier space and perform an inverse Fourier transform.
Refer to caption
Figure 2: Periodic re-gridding applied to a plane of the charge density for a two atom Si unit cell where each large circle corresponds to a data point in the original 12×1212\times 12 grid. The results of Fourier-interpolating the data in the original unit cell onto a 48×4848\times 48 grid is shown in (a). The transformed representation 𝐚^1=𝐚1+𝐚2\hat{\mathbf{a}}_{1}={\mathbf{a}}_{1}+{\mathbf{a}}_{2} and 𝐚^2=2𝐚1𝐚2\hat{\mathbf{a}}_{2}=2{\mathbf{a}}_{1}-{\mathbf{a}}_{2} with a shift of 0.4𝐚^10.4\hat{\mathbf{a}}_{1} and a grid size of 48×4848\times 48 is shown in (b).
Refer to caption
Figure 3: Data pipeline for the charge density database, illustrating how the output files from quantum chemistry calculations are stored and accessed. The data are first converted into a JSON-like format to be stored on a MongoDB server, which allows queries on any of the stored fields. The charge density data is converted into an array-like object with additional meta-data (e.g. ObjectID) and stored in an AWS S3 bucket. Since the ObjectID is stored as a field in the MongoDB database, the API is able to combine the MongoDB data with the according S3 object and reconstruct the original data.
Refer to caption
Figure 4: Distributions of the errors between re-sampled charge density and explicitly calculated charge densities.

Citing Data

In line with emerging industry-wide standards for data citation, references to all data sets described or used in this manuscript should be cited in the text with a superscript number and listed in the ‘References’ section in the same manner as a conventional literature reference. See the examples above.

References

  • 1 P. Hohenberg and W. Kohn. Inhomogeneous Electron Gas. Phys. Rev., 136(3B):B864–B871, 1964.
  • 2 W Kohn and L J Sham. Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev., 140(4A):A1133–A1138, 1965.
  • 3 D M Ceperley and B J Alder. Ground State of the Electron Gas by a Stochastic Method. Phys. Rev. Lett., 45(7):566–569, 1980.
  • 4 John P Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized gradient approximation made simple. Phys. Rev. Lett., 77(18):3865, 1996.
  • 5 R. M. Martin, R. M. Martin, and Cambridge University Press. Electronic Structure. Cambridge University Press, Cambridge, England, UK, 2004.
  • 6 M Gajdoš, K Hummer, G Kresse, J Furthmüller, and F Bechstedt. Linear optical properties in the projector-augmented wave methodology. Phys. Rev. B, 73(4):45112, 2006.
  • 7 Richard F. W. Bader. Atoms in Molecules: A Quantum Theory (International Series of Monographs on Chemistry (22)). Clarendon Press, Jun 1994.
  • 8 P. L. A. Popelier. A fast algorithm to compute atomic charges based on the topology of the electron density. Theor. Chem. Acc., 105(4):393–399, Apr 2001.
  • 9 A. Otero-de-la Roza, Erin R. Johnson, and Víctor Luaña. Critic2: A program for real-space analysis of quantum chemical interactions in solids. Comput. Phys. Commun., 185(3):1007–1018, Mar 2014.
  • 10 Ziqin Rong, Daniil Kitchaev, Pieremanuele Canepa, Wenxuan Huang, and Gerbrand Ceder. An efficient algorithm for finding the minimum energy path for cation migration in ionic materials. J. Chem. Phys., 145(7):074112, Aug 2016.
  • 11 Leonid Kahle, Aris Marcolongo, and Nicola Marzari. Modeling lithium-ion solid-state electrolytes with a pinball model. Phys. Rev. Mater., 2(6):065405, Jun 2018.
  • 12 Jimmy-Xuan Shen, Matthew Horton, and Kristin A. Persson. A charge-density-based general cation insertion algorithm for generating new Li-ion cathode materials. npj Comput. Mater., 6(161):1–7, Oct 2020.
  • 13 Seiji Kajita, Nobuko Ohba, Ryosuke Jinnouchi, and Ryoji Asahi. A Universal 3D Voxel Descriptor for Solid-State Material Informatics with Deep Convolutional Neural Networks. Sci. Rep., 7(16991):1–9, Dec 2017.
  • 14 Christoph Freysoldt, Blazej Grabowski, Tilmann Hickel, Jörg Neugebauer, Georg Kresse, Anderson Janotti, and Chris G. Van de Walle. First-principles calculations for point defects in solids. Rev. Mod. Phys., 86(1):253–305, 2014.
  • 15 Kiran Mathew, Joseph H. Montoya, Alireza Faghaninia, Shyam Dwarakanath, Muratahan Aykol, Hanmei Tang, Iek heng Chu, Tess Smidt, Brandon Bocklund, Matthew Horton, John Dagdelen, Brandon Wood, Zi Kui Liu, Jeffrey Neaton, Shyue Ping Ong, Kristin Persson, and Anubhav Jain. Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci., 139:140–152, Nov 2017.
  • 16 Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, and Kristin a. Persson. The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials, 1(1):011002, 2013.
  • 17 Anubhav Jain, Geoffroy Hautier, Shyue Ping Ong, Charles J. Moore, Christopher C. Fischer, Kristin A. Persson, and Gerbrand Ceder. Formation enthalpies by mixing GGA and GGA ++ UU calculations. Phys. Rev. B, 84(4):045115, Jul 2011.
  • 18 Francis P. Russell, Karl A. Wilkinson, Paul H. J. Kelly, and Chris-Kriton Skylaris. Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code. Comput. Phys. Commun., 187:8–19, Feb 2015.
  • 19 Thomas J. Leeper. AWS S3 Client Package [R package aws.s3 version 0.3.3], Jun 2017.
  • 20 Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, and Carol Willing. Jupyter notebooks – a publishing format for reproducible computational workflows. In F. Loizides and B. Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87 – 90. IOS Press, 2016.