A representation-independent electronic charge density database for crystalline materials
Abstract
In addition to being the core quantity in density functional theory, the charge density can be used in many tertiary analyses in materials sciences from bonding to assigning charge to specific atoms. The charge density is data-rich since it contains information about all the electrons in the system. With increasing utilization of machine-learning tools in materials sciences, a data-rich object like the charge density can be utilized in a wide range of applications. The database presented here provides a modern and user-friendly interface for a large and continuously updated collection of charge densities as part of the Materials Project. In addition to the charge density data, we provide the theory and code for changing the representation of the charge density which should enable more advanced machine-learning studies for the broader community.
1. Department of Materials Science and Engineering, University of California, Berkeley, Berkeley, California 94720, United States
2. Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, 94720, United States
3. Energy Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, 94720, United States
*corresponding author(s): Kristin A. Persson ([email protected])
Background & Summary
The application of Density Functional Theory (DFT) to many-electron systems has witnessed tremendous growth in the past few decades and has now become the de facto simulation tool for physicists, chemists, and materials scientists. The central concept of DFT is that the energy, and in turn all of the physical properties of a quantum system, are completely determined by the electronic charge density of the ground state with the position vector 1. The majority of the computational cost in typical DFT calculations is associated with determining via an iterative algorithm to arrive at a self-consistent charge density 2. For the most commonly used exchange-correlation functionals, like the local density approximation (LDA) 2; 3 and the semi-local functional by Perdew–Burke-Ernzerhof (PBE) 4, a converged charge density can be used as the starting point for more expensive calculations such as obtaining a detailed bandstructure 5 or calculating the optical response of the material 6.
In addition to its central role in standard DFT calculations, the charge density itself is also useful for the analysis of many materials properties. The critical points of the charge density (i.e. where the gradient is zero) are often used as a boundary between atomic neighborhoods. In turn, this allows for a systematic assignment of charge to specific atoms 7; 8, as well as the determination of bonding character between neighboring pairs 9. Within the realm of energy materials, the charge density can be used as an effective potential to study the migration properties of Li in solid-state materials, as low charge density provides a metric of “free” space in a lattice 10; 11. Consequently, the local minima of the charge density can act as initial guesses for the positions of inserted cations 12.
A single DFT calculation of the primitive unit cell provides one representation of the charge density within that particular basis set. However, depending on the data application, alternative representations might be desired. An important example of this is in machine-learning (ML) algorithms where obtaining a consistent data representation is absolutely essential for deep-learning methods. However, the representation of charge density is heavily influenced by the simulation cell and the Bravais lattice of the periodic structure. Hence, a necessary step in using electronic charge densities in machine-learning applications is the ability to obtain alternative representations of the same physical density. While recent work has examined the effectiveness of representations in Fourier space 13, any ML investigation of local interaction (e.g. adsorption and intercallation of ions) requires flexible representations in real space. Towards that end, our database will provide code to obtain arbitrary real space representations of charge density for a given material directly from a DFT-computed charge density.
The charge density of any crystalline solid, and indeed any periodic field, is naturally represented in a plane-wave basis set, where the inherent periodicity of the system is embedded in the underlying representation. For a sufficiently converged finite plane-wave basis, the continuous charge density and its Fourier transform can be accurately sampled by a three-dimensional array indexed by , , and with , , and evenly spaced grid-points along each lattice vector, and can be converted from one to the other via a discrete Fourier transform
(1) |
where the and represent the real and reciprocal lattice vectors, and , , and are the indices of regularly-spaced grid points along the lattice vectors. Due to the discrete nature of numerical Fourier transforms, the number of grid points of a real-space representation is equal to the number of plane waves needed to represent the data in reciprocal space.
A representation of the charge density is uniquely determined by three vectors and a scalar matrix either in real or reciprocal space. Each representation only provides a “view” of the infinite periodic data represented in a specific unit cell and an infinite number of such representations exist for a given charge density. Regardless of the grid size and the periodic cell representation, the DFT-computed charge density represents the same underlying field, yet they are routinely recomputed when any change is needed in the representation, even when the computational parameters are unchanged, often at considerable expense. One common example is the use of the electrostatic potential of a super cell to correct for the periodic image effects of a charged defect 14.
Due to the significant amount of computational resources devoted to computing the electronic charge densities, as well as the growing domains of their application especially for the training of data-intensive machine learning models, there is a pressing need for a large-scale representation-independent database of charge densities. The Materials Project (https://materialsproject.org) - as a rapidly growing (currently more than 170,000 users) materials informatics resource - is a natural platform for the dissemination of such data. The work presented in this article provides details on how the charge densities in our database are computed and how they can be accessed. In addition, we provide a high-level API for querying and post-processing of the charge density data. Among other features, the API will allow users to take an existing atomic structure and query for charge density of the same material, in the representation/view of the user’s choosing.
Methods
In this section, we provide details on the scope of the charge densities database (CDD) and the precise set of computational parameters used to generate the data. Additionally, we will demonstrate features of the API that allow users to obtain arbitrary views of the charge density data including up-sampling/compressing the data via Fourier analysis and symmetry operations like translations, rotations, and super-cell transformations.
Calculation parameters
The charge densities are obtained from DFT calculations run using the static calculation workflow within the atomate software package 15, and relaxed input structures from the Materials Project (MP) database 16. The projector-augmented wave (PAW) method as implemented in the plane-wave Vienna Ab-initio Simulation Package (VASP) is used in conjunction with the PBE generalized-gradient approximation functional 4. The default set of MP calculation input parameters was used, which have been demonstrated to produce well-converged results 17. Included in these parameters is an energy cutoff of 520 eV, a total energy error threshold of eV/atom, and a reciprocal -point density of . The only addition made to the input set is to enable aspherical contributions in the gradient corrections inside the PAW spheres. Hubbard -corrections are included with materials containing oxygen and fluorine. Elements Co, Cr, Fe, Mn, Mo, Ni, V, and We use values of 3.32, 3.70, 5.30, 3.90, 4.38, 6.20, 3.25, and 6.20 eV, respectively.
Changing the charge density representations
Given one representation of the charge density , we may transform it to any other representation by interpolating the data. Due to computation time and data storage constraints, DFT codes will typically use the fewest grid points possible to represent the charge density which limits the effectiveness of local interpolation schemes. However, since our charge densities have periodic boundary conditions and are reasonably smooth (owing to the use of pseudo-potentials), the charge density can be represented in Fourier space. we can up-sample our data via Fourier interpolations 18 as shown in Figure 1. The procedure to perform Fourier interpolation of real space data is as follows:
-
1.
Take the discrete Fourier transform of .
-
2.
Augment the resulting Fourier data with zero-valued higher frequency components.
-
3.
Apply the reverse transformation to obtain the up-sampled data.
The augmented Fourier data is mathematically equivalent to the original Fourier data. Thus, the inverse transform of the augmented Fourier data must be equivalent to the original real space data sampled at a higher density. Increasing the grid density using Fourier interpolation enables us to up-sample in each direction by a factor of . We may then resample the local grid with a linear interpolation scheme to ensure the fidelity of our data.
Given a primitive-cell representation of the charge density — , any periodic representation of a scalar field can be understood as applying an arbitrary translation on the unit cell by a vector :
(2) |
followed by a super-cell transformation defined as an integer matrix with which acts on the lattice vectors from the right
(3) |
Our software is capable of performing the same operations in arbitrary dimensions. As an example, in Figure 2, we show the results of re-griding in using a plane of the charge density in a two-atom Si unit cell which only cuts across a single atom at the origin, Figure 2 (a) shows the result of Fourier interpolating the field from a grid (large circles) onto a grid (smaller circles). In Figure 2 (b), the modified representation is obtained by first shifting the origin to the center of the cell at followed by a change of basis to and .
While integer-valued supercell transformations will yield an equivalent periodic charge density, non-integer basis transformations are used to obtain an arbitrary crop of periodic charge density sampled at any density. As an example, we show how a non-periodic cubic sample of the surface charge density can be obtained from the slab calculation in Figure 2 (c). The simulation was performed using a orthorhombic Si slab cell and the charge density is stored on a grid. A cropped region of the charge density sampled on a grid is indicated by the blue iso-surface in Figure 2 (c). It is important to note that the cropped cell can be greater in any dimension, as compared to the simulation cell. In the example, the smallest dimension of the simulation cell is 3.87 Å while the cropped cube has side lengths of 5 Å. This feature essentially allows us to robustly obtain the charge density in any preferred real-space dimensions, independent of the simulation cell parameters.
essentially allows us to freely choose the simulation cell in situations where periodic-image effects are not present.
Database details
We use a hybrid data model to serve the data: Queryable data such as chemical formula, total energy, and calculation parameters are served as JSON-like documents using MongoDB, while much larger and not-queryable charge density data is served using AWS S3 object storage 19. When a charge density is parsed from the output file to a serialized object, a unique Object ID is assigned and stored alongside the other data in the MongoDB database. From the user’s perspective, two subsequent API requests are needed. One to obtain calculation inputs and outputs from MongoDB, and another for Object ID and charge density data. A visual representation of the data flow is provided in Figure 3.
Code availability
The software used to access and transform the charge density data is accessible from the Materials Project API (https://github.com/materialsproject/api) and pyRho (https://github.com/materialsproject/pyRho) python package repositories on Github. See the Usage Notes section for more information.
Data Records
Raw charge density data output from DFT calculations can be obtained from the corresponding MP API endpoint: https://api.materialsproject.org/charge_density. Each entry can be referenced with a particular DOI through the associated MP material. Additionally, the input parameters for the specific calculation used to generate the entry can be obtained from the tasks endpoint at https://api.materialsproject.org/tasks. Details for how to interact with the referenced endpoints can be found in the Usage Notes section.
Technical Validation
We can elucidate the performance of the re-griding algorithm using a larger set of elemental polymorphs from the Materials Project. For this test set , we selected 389 single-element structures from MP for which the energy above the convex hull was less than 100 meV and the number of atoms in the unit-cell was less than 20. For each structure in , we perform VASP static calculations on the primitive unit cell and on a super-cell using
(4) |
For each charge density obtained using an explicit super-cell calculation, we obtain the average error compared to a super-cell charge density obtained from transforming the charge density. The results of the comparison are shown in Figure 4. We observe that using an up-sampling factor of 4, results in a periodic grid fine enough such that the pseudo-charge density, which typically ranges from 0 to 100 near the atomic cores, only exhibits a difference of 0.001 .
Usage Notes
To faciliate access to data, convenience functions have been implemented as part of the Materials Project API python client. These are contained within the MPRester class as part of the pymatgen software package (https://github.com/materialsproject/pymatgen). More specifically, two functions are provided to send independent requests to the API endpoints. These take as input the Materials Project ID associated with a given material in the database. The calculation input data from the tasks endpoint is then returned as a set of key-value pairs within a python dictionary, and the charge density data is de-serialized and returned as a pymatgen CHGCAR object. With the MPRester class imported, the following code workflow can be used.
In order to alter the representation of the charge density obtained from the API endpoint, the pyRho python package (https://github.com/materialsproject/pyRho) can be used alongside the obtained pymatgen (https://github.com/materialsproject/pymatgen) CHGCAR object. Examples of how to re-grid, interpolate, and visualize are included in the repository as a set of Jupyter 20 notebooks.
Acknowledgements
This work was supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division under contract no. DE-AC02-05-CH11231 (Materials Project program KC23MP).
Author contributions
JXS developed the regridding analysis software; JXS and SD developed the back-end API and JMM front-end API; JMM performed the DFT calculations that produced the charge densities; JXS, JMM, MKH, PH, and SD also participated in aggregating, ingesting and maintaining the data at different stages. KAP was responsible for supervising and advising the project at all stages.
Competing interests
The authors declare no competing interests.
Figures




Citing Data
In line with emerging industry-wide standards for data citation, references to all data sets described or used in this manuscript should be cited in the text with a superscript number and listed in the ‘References’ section in the same manner as a conventional literature reference. See the examples above.
References
- 1 P. Hohenberg and W. Kohn. Inhomogeneous Electron Gas. Phys. Rev., 136(3B):B864–B871, 1964.
- 2 W Kohn and L J Sham. Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev., 140(4A):A1133–A1138, 1965.
- 3 D M Ceperley and B J Alder. Ground State of the Electron Gas by a Stochastic Method. Phys. Rev. Lett., 45(7):566–569, 1980.
- 4 John P Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized gradient approximation made simple. Phys. Rev. Lett., 77(18):3865, 1996.
- 5 R. M. Martin, R. M. Martin, and Cambridge University Press. Electronic Structure. Cambridge University Press, Cambridge, England, UK, 2004.
- 6 M Gajdoš, K Hummer, G Kresse, J Furthmüller, and F Bechstedt. Linear optical properties in the projector-augmented wave methodology. Phys. Rev. B, 73(4):45112, 2006.
- 7 Richard F. W. Bader. Atoms in Molecules: A Quantum Theory (International Series of Monographs on Chemistry (22)). Clarendon Press, Jun 1994.
- 8 P. L. A. Popelier. A fast algorithm to compute atomic charges based on the topology of the electron density. Theor. Chem. Acc., 105(4):393–399, Apr 2001.
- 9 A. Otero-de-la Roza, Erin R. Johnson, and Víctor Luaña. Critic2: A program for real-space analysis of quantum chemical interactions in solids. Comput. Phys. Commun., 185(3):1007–1018, Mar 2014.
- 10 Ziqin Rong, Daniil Kitchaev, Pieremanuele Canepa, Wenxuan Huang, and Gerbrand Ceder. An efficient algorithm for finding the minimum energy path for cation migration in ionic materials. J. Chem. Phys., 145(7):074112, Aug 2016.
- 11 Leonid Kahle, Aris Marcolongo, and Nicola Marzari. Modeling lithium-ion solid-state electrolytes with a pinball model. Phys. Rev. Mater., 2(6):065405, Jun 2018.
- 12 Jimmy-Xuan Shen, Matthew Horton, and Kristin A. Persson. A charge-density-based general cation insertion algorithm for generating new Li-ion cathode materials. npj Comput. Mater., 6(161):1–7, Oct 2020.
- 13 Seiji Kajita, Nobuko Ohba, Ryosuke Jinnouchi, and Ryoji Asahi. A Universal 3D Voxel Descriptor for Solid-State Material Informatics with Deep Convolutional Neural Networks. Sci. Rep., 7(16991):1–9, Dec 2017.
- 14 Christoph Freysoldt, Blazej Grabowski, Tilmann Hickel, Jörg Neugebauer, Georg Kresse, Anderson Janotti, and Chris G. Van de Walle. First-principles calculations for point defects in solids. Rev. Mod. Phys., 86(1):253–305, 2014.
- 15 Kiran Mathew, Joseph H. Montoya, Alireza Faghaninia, Shyam Dwarakanath, Muratahan Aykol, Hanmei Tang, Iek heng Chu, Tess Smidt, Brandon Bocklund, Matthew Horton, John Dagdelen, Brandon Wood, Zi Kui Liu, Jeffrey Neaton, Shyue Ping Ong, Kristin Persson, and Anubhav Jain. Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci., 139:140–152, Nov 2017.
- 16 Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, and Kristin a. Persson. The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials, 1(1):011002, 2013.
- 17 Anubhav Jain, Geoffroy Hautier, Shyue Ping Ong, Charles J. Moore, Christopher C. Fischer, Kristin A. Persson, and Gerbrand Ceder. Formation enthalpies by mixing GGA and GGA calculations. Phys. Rev. B, 84(4):045115, Jul 2011.
- 18 Francis P. Russell, Karl A. Wilkinson, Paul H. J. Kelly, and Chris-Kriton Skylaris. Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code. Comput. Phys. Commun., 187:8–19, Feb 2015.
- 19 Thomas J. Leeper. AWS S3 Client Package [R package aws.s3 version 0.3.3], Jun 2017.
- 20 Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, and Carol Willing. Jupyter notebooks – a publishing format for reproducible computational workflows. In F. Loizides and B. Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87 – 90. IOS Press, 2016.