\altaffiliation

Contributed equally to this work. Northwestern University] Department of Materials Science and Engineering, Northwestern University, Evanston, IL, USA \altaffiliationContributed equally to this work. Northwestern University] Department of Materials Science and Engineering, Northwestern University, Evanston, IL, USA MIT]Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Northwestern Industrial]Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA Northwestern University] Department of Materials Science and Engineering, Northwestern University, Evanston, IL, USA Northwestern Industrial]Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA MIT]Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Northwestern University] Department of Materials Science and Engineering, Northwestern University, Evanston, IL, USA Northwestern University] Department of Materials Science and Engineering, Northwestern University, Evanston, IL, USA

Database, Features, and Machine Learning Model to Identify Thermally Driven Metal-Insulator Transition Compounds

Alexandru B. Georgescu [ Peiwen Ren [ Aubrey R. Toland [ Shengtong Zhang [ Kyle D. Miller [ Daniel Apley [ Elsa A. Olivetti [ Nicholas Wagner [ James M. Rondinelli [email protected] [

Abstract

Metal-insulator transition (MIT) compounds are materials that may exhibit insulating or metallic behavior, depending on the physical conditions, and are of immense fundamental interest owing to their potential applications in emerging microelectronics. An important subset of MIT materials are those with a transition driven by temperature. The number of thermally driven MIT materials, however, is scarce, which makes delineating these compounds from those that are exclusively insulating or metallic challenging. Most research that addresses thermal MITs is limited by the domain knowledge of the scientists to a subset of MIT materials, and is often focused on a limited subset of possible features. Here, using a combination of domain knowledge and natural language processing (NLP) searches, we have built a material database comprising thermally driven MITs, as well as metals and insulators with similar chemical composition and stoichiometries to the MIT compounds. We featurized this dataset using a wide variety of compositional, structural, and energetic descriptors, including two MIT relevant energy scales, the estimated Hubbard interaction and the charge transfer energy, as well as the structure-bond-stress metric referred to as the global-instability index (GII). We then performed supervised classification on this dataset, constructing three electronic-state classifiers: metal vs non-metal (M), insulator vs non-insulator (I), and MIT vs non-MIT (T). This classification allows us to identify new features separating MIT materials from non-MIT materials. These include the 2D feature space consisting of the average deviation of the covalent radius and the range of the Mendeleev number. We discuss the relationship of these atomic features to the physical interactions underlying MITs in the rare-earth nickelate family. We then elaborate on other features (GII and Ewald energy), and examine how they affect the classification of binary vanadium and titanium oxides. Last, we implement an online version of the classifiers, enabling quick probabilistic class predictions by uploading a crystallographic structure file. The broad accessibility of our database, newly identified features, and user-friendly classifier models, will aid in accelerating the discovery of MIT materials.

1 Introduction

Metal-insulator transition (MIT) materials undergo an electronic phase change from a metallic to an insulating state as a function of applied external conditions, i.e., temperature, pressure, or doping. The transition is typically discerned through optical and/or transport measurements.¹ Predicting whether a material is prone to undergo an MIT or not is an ongoing research area^{2, 3, 4} of high technological importance. MIT materials may deliver new “steep slope” transistors that operate at very low voltage for beyond-Boltzmann-based computation^{5, 6} or function as “smart” components in thermochromic windows.⁷. To that end, there is strong interest in discovering new materials exhibiting MITs with improved properties.

Rational design of MIT materials, however, has proven to be difficult. One reason for this is that the electronic phase transition may arise from a variety of (possibly competing) mechanisms (see Fig. 1 for a non-exhaustive list). The transition mechanism is often the subject of intense debate. The discussions are often limited to select chemistries and crystal structures⁸. The transition is often characterized by certain microscopic physical observables that differentiate the metallic from the insulating state, which are then treated as variable order parameters. In cases where the order parameters are known, there is often ambiguity over whether the transition is driven by the electronic or lattice order parameters, which hinders subsequent control and optimization of the transition characteristics, as well as over the role of the other different structural and electronic modes in tuning the relative energies that characterize the transition^{9, 10, 11}. While significant progress has been made recently in disentangling the electronic and lattice degrees of freedom in the low-temperature insulating state¹², understanding the temperature dependence of the electronic properties of a material, and particularly whether a material with an insulating state at $T=0$ K will transition to a metallic state, or vice versa, as the temperature is raised in a reliable way, is beyond the currently available high-throughput methods available in the field.

Progress in the synthesis of high-quality materials, novel characterization methodologies, and advances in the quantum-mechanical modeling and theory of electron correlations has led to the recognition that subtle details in the crystal and local structure are indeed essential to describing MITs.^{13, 14, 15, 12} Displacive distortions to the size, shape, and connectivity of the basic metal-oxygen polyhedra or shortening of metal-metal distances in transition metal compounds are common in a number of these materials, which adopt different crystal structures (Fig. 1). This is best exemplified in the thermal MIT in VO₂, which appears to be described by a Mott-assisted Peierls transition, rather than exclusively Mott-Hubbard or Peierls-type physics^{8, 16, 17}. More recently, the perovskite oxide rare-earth nickelate family^{12, 11, 18, 10, 19, 20, 21, 22, 23, 24, 23} and the Ruddlesden-Popper ruthenate Ca₂RuO₄ ^{12, 25, 26, 27, 28, 29} have been the subject of intense study, with both theoretical and experimental work focused on identifying and understanding the MIT mechanism to enable phase control. Improvements in sample quality have allowed for new metal-insulator transitions to be discovered even in previously known materials ³⁰. Although most MIT materials are oxides, there is an increasing amount of work focused at the discovery of materials with anions different from oxygen^{31, 32}.

Refer to caption — Figure 1: Relationship between atomic distortions and the physical interactions driving MITs in transition metal compounds comprised of diverse chemistries and structure types.

Despite this progress, there are still less than a hundred inorganic materials which exhibit thermally-driven MITs (Fig. 2). Further, despite their scarcity, before this work there has been no standard library of MIT materials available to the general scientific audience, which has slowed down the study of known MIT materials, and most likely reduced the rate at which new materials are discovered. Experimental databases of synthesized and predicted inorganic materials, e.g., the ICSD³⁵ or SpringerMaterials³⁶ lack the necessary information to assign an electrical conductivity class labels: always metallic, always insulating, or exhibiting a thermal MIT. Although high-throughput first-principles databases exist, e.g., Materials Project³⁷, OQMD³⁸, and AFLOW³⁹, the methods used to compute the data often omit essential microscopic interactions and corrections to standard DFT exchange-correlation functionals that could capture the MIT physics. The theory used (density functional theory) is also $T=0$ K theory. While these databases have been used to successfully build machine-learning classifiers for separating metals and insulators, as differentiated at a band-theory level (see Table S2 of the Supporting Information, SI), they often do not include the relevant physics to describe MIT materials families. Particularly, they do not model in sufficient detail the effect of electron-electron interactions that are crucial to understanding the opening of the band gap in correlated materials. High-throughput DFT without the use of appropriate microscopic models⁴⁰ or corrections specific to correlated materials can lead to the incorrect classification of MIT materials and some insulators as metals. For example, the insulator LaTiO₃ and the MIT material NdNiO₃ are both listed as having 0 eV band gaps in Materials Project. Indeed, Fig. 2 shows that a large number of materials—including insulators—would be classified as metals based on the simulated 0 eV band gap from Materials Project providing a poor starting point for any further classification. The combination of materials scarcity and inaccurate descriptions of the ground state are among the main difficulties in building machine-learning models for discovering and understanding correlated materials. Furthermore, limited efforts have focused on identifying whether any of these materials possess thermally-driven MITs at finite temperatures, as the first-principles calculations often only report 0 K data. Current theoretical methods are often insufficient to understand the complex temperature-dependent electron-lattice interplay leading to an MIT^{12, 40}.

Here, we resolve these limitations and build a database of experimentally confirmed temperature-driven MIT compounds through a combination of domain-knowledge and natural language processing (NLP). We then augment this database with structurally and compositionally related materials that are exclusively metallic or insulating. We featurize the complete dataset using atomic, electronic, and structural descriptors along with MIT-specific features, including an unscreened Coulomb interaction through an estimated Hubbard $U$ energy, $U_{\mathrm{est}}$ , an estimated charge transfer energy, $\Delta_{0}$ , and the global-instability index (GII). After training multiple supervised learning models for the three classification models, i.e., the metal vs non-metal (M), insulator vs non-insulator (I), and MIT vs non-MIT (T) classification tasks, we identify new features whose interplay separate MIT materials from non-MIT materials. Analysis of the SHAP scores of the T-model led us to identify two previously unappreciated descriptors that offer significant class separation without requiring sophisticated computational techniques: ( $i$ ) average deviation of the covalent radius (ADCR), a feature that describes the relative size difference among the elements comprising a compound, and ( $ii$ ) a compositional feature, called the range of the Mendeleev number. The GII and Ewald energy are also identified as important features. We then examine the role these features play in the MITs exhibited by binary vanadates, titanates, and complex rare-earth nickelates. Finally, we describe an online tool comprised of the three binary classifiers, which enables a user to upload a crystal structure file to obtain three probabilities of it being identified as a metal, insulator, or MIT compound.

2 Methods

Feature-based supervised learning involves data acquisition, feature engineering, and model building. The main result of our data acquisition is a database containing 343 materials, each labeled as a metal, insulator, or metal-insulator transition compound, based on available experimental literature. At the time of this publication, there are 96 metals, 179 insulators and 68 MIT materials in the dataset. Next, we obtained a crystal structure for each material via one of the following methods, in order of descending preference: retrieval from experimental library (ICSD or Springer), retrieval from the Materials Project, or in-house generation, as described in the SI.⁴¹ The crystal structure of each material was then automatically featurized using common descriptors from Magpie,⁴²and those obtained from domain knowledge.

We labeled materials as MIT compounds if experimental literature on them shows that they exhibit an insulating ${\partial\rho}/{\partial T}<0$ temperature-dependent resistivity on one side of a critical transition temperature $T_{\mathrm{MIT}}$ , and a metallic ${\partial\rho}/{\partial T}>0$ temperature-dependent resistivity on the other side of $T_{\mathrm{MIT}}$ . When there was ambiguity regarding the change in sign of the experimental ${\partial\rho}(T)/{\partial T}>0$ data, we used additional experimental data to determine the class label. For example, if optical data shows a finite charge gap at temperatures below $T_{\mathrm{MIT}}$ , but none above it (as is the case with the MIT in Sr₃Fe₂O₇ ⁴³), we assigned an MIT label. Such subtleties are common among transition metal compounds, which are also prone to non-stoichiometry which can also influence class assignment, especially among less studied materials.

Readers can view the class-label assignments for each material at Ref. 41. This electronic database hosts the latest experimental information on thermal MIT compounds and related metals and insulators. The data we present herein represents the state-of-knowledge on the class labels for materials available at the time of publication.

2.1 Data Acquisition and Database Construction

Fig. 3 illustrates the process by which we built the database for training the electronic classification models. We used a combination of domain knowledge and natural language processing (NLP) to generate both an initial materials database and a keyword list for the NLP pipeline. The materials database initially included all MIT compounds known by the authors along with related materials, including binary vanadates V_nO_m, $R$ NiO₃ nickelates ( $R$ a rare-earth metal), LaCoO₃, and some Ruddlesden-Popper oxides (e.g., Ca₂RuO₄), as well as lacunar spinels. Compounds with similar chemistries and stoichiometries that are exclusively metallic or insulating were then also added to the database.

To extend the database beyond the materials known to the authors, we have also used natural language procesing (NLP). The text corpus used for the NLP methods included 70,123 papers, which was down selected from an entire text corpus of over 4 million scientific papers and journals. Down selection to the highly specialized MIT text corpus was performed using keywords (Table S1), describing MITs and correlated electron systems from the authors’ domain knowledge, which matched words in the titles, abstracts, and introductory paragraphs of papers. New MIT compounds identified from two types of NLP searches (vide infra) were then verified manually by assessing experimental transport (and/or optical) data. The newly identified compounds were each assigned a metal, insulator, or MIT class label and added to the database. Based on these identifications, new keywords were also added to guide the NLP search or additional human searches to find more compounds relevant to the classification tasks. This process was repeated until the database acquired 343 unique entries.

Two NLP methods were used in our workflow. First, using the specialized MIT text corpus of approximately 70,000 papers and a state-of-the-art NLP pipeline,⁴⁴ we extracted chemical formulas of possible MIT compounds from each paper. This entity recognition process included tokenization of the relevant MIT text corpus, normalization to lemmatize each term, part-of-speech tagging using dependency parsers, and finally token classification. Using this NLP method, we identified the MIT compound PrRu₄P₁₂, which belongs to the skutterudite family frequently studied in thermoelectrics research. Its transition is attributed to the physics of Pr 4f electrons, rather than the Ru 4d electrons.⁴⁵ Although the authors were unaware of this compound prior to the NLP search, we were able to combine our MIT domain knowledge to perform a manual search of the literature and identified SmRu₄P₁₂ as another skutterudite MIT material, as well as a wide range of other skutterudite materials which exhibit solely metallic or insulating behavior.

Second, we employed a FastText model trained directly on the specialized MIT text corpus. The resulting word embeddings were then compared in terms of cosine similarity to previously identified MIT materials. We use a different cosine-similarity approach than that in Ref. 46 to identify compounds of interest. For each compound in the original dataset of identified materials, 100 words with highest cosine similarity in the trained FastText model were identified. Then using the metal, insulator, and MIT class labels present within the dataset, we grouped the closest 100 words for each compound into closest words for a given label. Because there were approximately 50 temperature-driven MITs in the original dataset, this grouping method resulted in $\approx$ 5,000 compounds with repeats, that were closest in cosine similarity of word embeddings to known MIT compounds. Then within the group of compounds for a given class label, the 20 most commonly occurring words were identified, resulting in 20 words for each classification label (60 words in total). Of the 60 words, the vast majority were chemical formulas with no noise.¹¹1Nine words were not exact chemical formulas of compounds and thus were considered to be noise. Although we assigned the words as noise, some words were still relevant; for example, the word “La $A$ O₃” appeared in the 20 most common words associated with insulators, where $A$ represented an unidentified element on the periodic table. We further simplified the search by keeping only those words which were exact chemical formulas. After these compounds were identified, abstracts and titles were searched for these specific compounds. Among this subset of literature, we then searched the classifications for the compounds identified by the FastText model and added new materials to the database. The two NLP-assisted searches led to the addition of 116 compounds to the database (representing a 51 % increase in size from the initial dataset), which was then augmented further by additional human searches using the new domain knowledge. Our database was expanded from about 190 unique compounds to the current 343 with the addition of the mixed human + NLP search. This expansion led to identification, for example, of the skutterudite family in which the MIT mechanism is driven by the rare-earth cations rather than transition metal-driven physics characteristic of the other materials in our database.

As a result of this search, we have built a database that is as representative as possible. Fig. 4 shows the distribution of the complete dataset obtained from this workflow at the date of submission of this paper. As expected for a database of materials dominated by transition metal oxides, the dataset is imbalanced, with the number of insulators significantly greater than that of metals or MIT materials. All known MITs fall into a relatively restricted class of compounds and most frequently appear as complex ternary materials, often transition metal oxides and sulfides. In order for our database to be focused on thermal MITs, we have excluded compositionally-controlled MIT compounds. Our model then focuses on complex, inorganic materials involving d-shell or f-shell electrons as the valence states. Although an arbitrary increase in our database size to go beyond this limited scope may allow for an increase in our classifier scores, this increase would not be meaningful as we want the metals and insulators to have similar chemical composition and stoichiometry to the MIT compounds. The heatmap in Fig. 4 can be used as a quick guide to identify if a new compound to be tested contains elements that are already in the training data. This would better inform the decision as to how much trust one could place on the classification provided by our ML classifiers. Analogous heatmaps for metals, insulators or the MIT compounds can be found in the SI.

As most MITs are accompanied by a structural change coupled to the electronic transition, an important question in building our model was which structure to select for featurization: the structure corresponding to the low-temperature, often insulating phase, or the high-temperature, often, metallic phase, or both? Across the electronic transition, a symmetry lowering distortion typically occurs with the low-temperature phase comprising small atomic displacements absent in the high temperature structure (see local distortions illustrated in Fig. 1). For example, the rare-earth nickelates at low-temperature have a breathing distortion, i.e., small and large NiO₆ octahedra alternate in a 3D checkerboard pattern.^{47, 48} This type of microscopic model associated symmetry-breaking is usually known only in materials after the compound has been studied sufficiently to be labeled as an MIT material. Furthermore, most crystal structures are initially reported at room temperature, which can be far from $T_{\mathrm{MIT}}$ . To build a model that is both mechanism-agnostic and can predict whether a material will have an MIT based on a simple theoretical or experimental structure before in-depth analysis could be performed, we opted to include only the high temperature structures in our dataset, if available. This allows our model to learn the susceptibility of a compound to undergo an MIT using more readily available high-temperature structures.

2.2 Feature Engineering

Constructing the machine learning model requires tabulating appropriate features to describe its properties for accurate class predictions. Our features include the Magpie⁴² composition feature set, oxidation state, Ewald energy, local and crystal structure parameters (i.e., variation in bond lengths and atomic volumes), and the global instability index (GII)⁴⁹, all of which are accessible from the Matminer⁵⁰ package. Certain descriptors known from the material physics community to be important in describing MIT compounds, however, are unavailable in these standard libraries designed for machine learning. To that end, we constructed additional structural features, intended to capture the displacive distortions shown in Fig. 1, as well as built featurizers that can provide an estimate of the electronic energy scales used in the Zaanen-Sawatzky-Allen framework⁵¹ to separate metals from insulators.⁵²

Now, we will describe some of the features determined to be important and their implementation. We begin by highlighting the range (or minimum) of the Mendeleev number and the average deviation of the covalent radius (see Fig. 5 for values used in this work). These two features both appear consistently as features with high importance in several training iterations, and have physically interpretable meanings. The Mendeleev number provides an alternative label beyond atomic number to distinguish elements with shared characteristics⁵³. The Mendeleev number generally (but now always) increases down the columns of the periodic table, then increases from left to right. This ordering is intended to bunch elements with similar chemical properties together by an expected oxidation state in most materials. To understand how this can be useful, consider the $AB$ O₃ perovskite oxides with $A$ a rare-earth and $B$ a transition metal. The minimum of the Mendeleev number will characterize the $A$ cation and the maximum of the Mendeleev number will characterize the O anion. Thus, the range of the Mendeleev number will be the difference between the Mendeleev numbers of the O anion and the $A$ cation. Since most compounds in our dataset are oxides with just a few being sulfides, the maximum of the Mendeleev number is effectively fixed for a substantial portion of our dataset with only the minimum of the Mendeleev number varying between different compounds. This aspect leads to high correlation between the minimum and range Mendeleev number features: the range and minimum of the Mendeleev number have a linear correlation of $-0.995$ and can be considered equivalent features for most of our dataset. We chose to use the range of the Mendeleev number simply as it takes the minimum of the Mendeleev number into account, which may be important for other types of compounds (such as sulfides), and the effect of choosing one over the other is negligible on the performance of our models.

The average deviation of the covalent radius (ADCR) describes how different the covalent radii are for different elements in a compound:

\mathrm{ADCR}=\frac{1}{N}\sum_{i=1}^{N}|R_{i}-\bar{R}|,

(1)

where $R_{i}$ is the covalent radius of the element $i$ , $\bar{R}$ is the average covalent radius of all elements in a specific compound, and $N$ is the total number of different elements. For example in an $A_{n}$ X_m compound,

\mathrm{ADCR}\,({A}_{n}{X}_{m})=\frac{n|R_{A}-\bar{R}|+m|R_{X}-\bar{R}|}{n+m}

(2)

with the weighted average $\bar{R}=({nR_{A}+mR_{X}})/({n+m})$ . We use covalent rather than ionic radii as features, because while ionic radii are known to underlie structural stability in ionic crystals according to Pauling’s rules,⁵⁴ they rely upon knowledge of oxidation states and coordination environment. Further, some of our compounds exhibit non-integer oxidation states, or have oxidation states that are the subject of ongoing research (including many of the skutterudites), which means that including them would have introduced additional ambiguity into our model.

Now, we turn to some of the features for which we have built our own featurizers. We begin with the structural global-instability-index (GII) descriptor, which is defined as

\mathrm{GII}=\left({\frac{1}{N}\sum_{i=1}^{N}d_{i}^{2}}\right)^{1/2},

(3)

where $d_{i}=\mathrm{BVS}(i)-V_{i}$ is the difference between the bond valence sum (BVS) for the $i^{th}$ ion and its formal valence and $N$ is the number of ions in the unit cell.^{55, 56} The GII is the root-mean-square deviation of the bond valence sums from the formal valence averaged over all atoms in the cell. This can be understood as approximating an average structural stress inherent in a material, as it captures the average deviation in bond lengths from what would be experimentally expected. The stresses arise from a combination of over- and underbonded cation-ligand interactions and thus describe bond strains compatible with a given structure and crystallographic symmetry.⁵⁷

Next, we constructed additional structural features, which in turn allowed us to build the electronic features from the Zaanen-Sawatzky-Allen (ZSA) framework. The structural features include: the minimum, maximum, and mean distances between transition metal $M$ cations; the minimum, maximum, and mean distances between the transition metal and the ligand $L$ ; and the Madelung site potentials for the cations and anions. These features were then used to calculate approximations⁵² for the relevant ZSA energy scales: the difference between the highest occupied and lowest unoccupied metal orbitals (an estimated Hubbard $U$ , hereafter $U_{0}^{\prime}$ ) and a charge-transfer energy gap ( $\Delta_{0}$ ). Both electronic parameters originate from an ionic model involving local charge excitations with

U^{\prime}_{0}=I_{v+1}(M)-I_{v}(M)-e^{2}/d_{M-M}\,,

(4)

where $I_{v}=A$ is the electron affinity of $M^{v+}$ , $I_{v+1}$ is the ionization potential, and $e^{2}/d_{M-M}$ is the Coulomb attraction between the excited electron on one transition metal cation and the hole left behind on its nearest-neighbor cation. The estimated charge-transfer energy is

\Delta_{0}=e\Delta{}V_{M}+I(L^{n-})-I_{v}(M)-e^{2}/d_{M-L}\,,

(5)

where $e\Delta{}V_{M}$ is the product of the electron charge and the difference in electrostatic Madelung site potentials between the cation and ligand sites, $I(L^{n-})$ is the ionization potential of the ligand in the $-n$ oxidation state, e.g., $I(\mathrm{O}^{2-})=-7.7$ eV for oxygen ligands. $I_{v}$ is as before and $e^{2}/d_{M-L}$ is the Coulomb attraction between the excited electron on the metal and the hole left behind on its ligand. Code for calculating $U^{\prime}_{0}$ and $\Delta_{0}$ was adapted from Ref. 58 and is available in Ref. 41. The ionization potentials and electron affinity values were web-scraped from the NIST Atomic Spectra Database⁵⁹.

2.3 Supervised Learning Scheme

2.3.1 Machine Learning Algorithm

To determine which machine learning model is best suited for our classification task, six different models were used in the model selection process: dummy classifiers with random guessing, linear logistic regression models with L2 regularization, generic decision tree models, random forest classifiers, gradient-boosting classifiers, and extreme gradient-boosting classifiers as implemented in XGBoost.⁶⁰ Model hyperparameters were optimized using grid search on the training split in stratified 5-fold cross-validation and are available at Ref. 41.

Tree-based ensemble methods have been demonstrated empirically to be very efficient machine-learning models that also provide interpretability,^{61, 50} making it easier to expand our domain knowledge. Indeed, XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. We found the XGBoost models were consistently among the best performing models and they were relatively fast to train compared to random forest and gradient-boosting models as described in the SI. The accessibility of our code allows the user to test their own structures in a browser via the Binder service which we will discuss below or on their personal computer. For these reasons, all classifiers presented here are based on XGBoost models, which are trained on two different features sets: a full feature set and a reduced feature set as described next.

2.3.2 Feature Selection

In order to obtain an easily interpretable model, and to avoid possible overfitting due to the large number of features, we performed a downselection to certain key features. This selection follows an iterative approach. In the first iteration, our raw feature set included 164 features (163 numeric features and 1 one-hot-encoded categorical feature with 2 levels), which is large compared to the number of compounds. To reduce the feature space complexity, we first removed any numeric features with 0 variance or with an absolute value of linear correlation greater than 0.95 with other features. This resulted in 106 features and is referred to as the full feature set.

Principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP)⁶² are often used to reduce the number of features. The linear and/or nonlinear combination of the original features used in these approaches to create new features, however, makes it difficult to interpret the physical meaning of each new descriptor. Since we desired to preserve the physical meaning of our features, we used a combination of Shapley additive explanations (SHAP)⁶³ and domain knowledge (physical intuition) to downselect the features in the second iteration. For each of the three binary classifiers, SHAP analysis on the full feature set was used to find the 10 most important features, i.e., the top 10 features with the highest average absolute SHAP values. From this SHAP analysis, 6 features were selected and then combined with 4 features chosen using domain knowledge, which resulted in a total of 10 features. This second feature set is referred to as the reduced feature set. Both the full and reduced feature sets are available at Ref. 41.

We also note that the SHAP scores of each feature are highly dependent on the training dataset: minor updates to the dataset result in significant changes in SHAP importance scores, leading to the conclusion that these measures may not be reliable if taken individually. This is likely a result of our small dataset: in the full feature set, we have 343 compounds and 106 features to describe them, leading to potential overfitting, which will be further addressed in future versions of the code. As a result, we handpicked a combination of features that either appear consistently as important in our SHAP analysis or we believed to be relevant from physical intuition. For instance, as we increased the number of compounds and trained models on the different iterations of the dataset, we found several features that consistently exhibited high SHAP values, such as the global instability index and the average deviation of the covalent radius. These features were then combined with those deemed important from materials domain knowledge such as the average metal-metal distance, the Hubbard $U$ strength, and charge transfer energy, to form the reduced feature set. The physical interpretation of these features is discussed in more detail below.

2.3.3 Model Metrics

We computed model classification performance metrics such as receiver operating characteristic (ROC) curves and precision-recall curves with stratified cross-validation splits. Because the splits are dependent on the random seed used to generate them, and performance can vary depending on the different train-validation splits from different seeds, we performed cross-validation using 10 random seeds with integers from 0 to 9. For each of the 10 seeds, we performed a stratified 5-fold cross validation from which we calculated a median value. All metric values we report hereafter are the median values along with the interquartile range of those 10 median values. We carried out all of our cross-validations with the scikit-learn ⁶⁴ Python package. Weighted F-1 scores that take class-imbalance into account were also used for model assessment.

For a dataset this small, a test set usually should not be used to evaluate model performance as there are only 343 training examples; it was unfeasible to set aside a hold-out set, since to the best of our knowledge, our current temperature-driven MIT materials database is exhaustive. However we will address this issue in future versions of our code, particularly as our database expands. Through cross-validation on 10 random seeds, each with stratified 5-fold splits, we then use the cross-validation performance as a proxy for the actual test set approach. Nonetheless, we did include for completeness a 90%-10% train test split evaluation using the same 10 random seeds, which resulted in similar performance, as reported in the SI. The key difference here is that for the original cross-validation approach, the hyperparameter tuning process uses the entire dataset while during the train test split, it uses only 90% of the data, and thus the performance is more indicative of the models’ extrapolative power.

3 Results

3.1 Classifier Performance

We first present performance results for our three binary classifier models, as they perform significantly better than a single ternary classifier (see Figure S1 of SI). Classifier M distinguishes between metals and non-metals, which include compounds labeled as insulators (I) or MIT compounds (T). Classifier I is the analogous classifier for insulators with non-insulators comprised of M and T compounds. Classifier T distinguishes MIT compounds from non-MIT compounds, i.e., metals and insulators.

All model performance metrics presented are from models trained on the reduced feature set. The corresponding metrics from models trained on the full feature set are available at Ref. 41. Except for the I classifier, which has a slightly worse performance for the one trained on the reduced feature set than that trained on the full feature set, the M and T classifiers are able to retain the same level of performance when trained on the reduced feature set compared to those trained on the full feature set (see Figure S2).

To visualize model performances, ROC curves for our machine learning models are constructed across 5 random cross-validation runs (Fig. 6) under 10 random seeds. The ROC curve depicts the true positive rate, which is the ratio between the number of correctly identified positive class (true-positives) and all of the positive class (the sum of true-positives and false-negatives), against the false-positive rate, or the ratio of false-positives to the sum of false-positives and true-negatives. An ROC curve confined primarily to the upper left corner is an indication of a model correctly identifying instances of each class without many false positives. An area under the ROC curve (AUC) of 1 represents perfect separation whereas an AUC of 0.5 is equivalent to random guessing. Tighter bunching of the lines indicates less variance in model performance with varying random seeds. We obtain a median ROC area under curve (ROC-AUC) of 0.90 and 0.89 with interquartile ranges of 0.02 and 0.01 for the M and I classifiers, respectively. Remarkably, our novel T classifier exhibits a median ROC-AUC of 0.90 with an interquartile range of 0.03, indicating its overall accuracy is high.

Fig. 7 presents the precision (proportion of true positives to the sum of true positives and false positives) and recall (proportion of true positives to the sum of true positives and false negatives) curves of each binary classifier to better understand performance owing to imbalance among the classes. The median and interquartile range of the cross-validation weighted F₁ scores (harmonic mean of precision and recall that takes class imbalance into account) are 0.86 (0.03), 0.82 (0.02), and 0.88 (0.01) for the M, I, and T classifiers, respectively.

On one hand, we find that the area under the precision-recall curve for Classifier T is rather low. Indeed, MITs are the least represented class in the dataset owing to the small number of known thermally-driven MIT materials. The poor precision-recall performance could perhaps be overcome with additional data as seen in other works (Table S2). As more positive examples (MIT compounds) are added to the dataset, we expect the precision of Classifier T to improve because the under-representation of MITs in the training set may lead to under-prediction of MITs, which results in a smaller number of true positives and thus a lower precision. On the other hand, the performance of Classifier M and I is comparatively better than that of Classifier T since the dataset contains more metals and insulators. As a result, the models were able to better separate metals from non-metals and insulators from non-insulators. In other words, these two classifiers exhibit better performance, because for them the ratio of positive class to negative class is more balanced than that of Classifier T.

Compared with electronic state classifiers formulated in earlier works using various databases and different descriptors (Table S2), our models’ performance metrics for the M and I models are comparable. However, we note that the previous models were not intended to learn correlated electron materials, and their metrics if trained on our sparse dataset would likely be different. Therefore, the comparison is not strictly appropriate.

We also created a survey that let domain experts (e.g., materials scientists) classify the conductivity class of 18 compounds (6 metals, 7 insulators and 5 MIT compounds). The goal was to establish a human performance baseline for the 3 aforementioned classifiers and to evaluate whether identifying MIT compounds is a trivial task for human scientists. Unsurprisingly, the XGBoost classifiers outperform the average human scientist in every classification task (see Figure S3 of SI).

3.2 Feature Importance and Physical Interpretation

We now use a combination of domain knowledge, SHAP values, and Accumulated Local Effect (ALE) analysis to examine the role of different features in the T Classifier trained on the reduced feature set. We want to know how the model learns to differentiate an MIT compound from those that are exclusively metallic or insulating. SHAP values indicate which features are important and the effects of the features on the classification (i.e., how does changing the value of a feature change the classification). ALE plots play a similar role in elucidating the importance of each feature in classifying a material, as well as the role of the interactions between the features. Some of the most illustrative ALE plots can be found in the SI. Fig. 8 shows the rank-order SHAP importance of each feature as well each feature’s relative role in the MIT compound classification (e.g., whether a material having a small value of GII makes it more likely to be classified as an MIT compound or not).

The most important feature in our classification is the average transition metal - transition metal distance, which we refer to as metal-metal distance for brevity, as shown in Fig. 8. The ALE decomposition shows that the importance of this feature largely arises from its interaction with other features. This finding agrees well with our physical intuition. In transition metal compounds, the metal-metal distance may determine the electronic bandwidth $W$ , which is in competition with other energy scales such as the Hubbard $U$ , as described for example in the ZSA-classification scheme, that drive MITs. Indeed, we show in a 2D scatter plot the distribution of materials as a function of the metal-metal distance and Hubbard $U$ and the ALE plot for these two features in Fig. 9. We find that most of the most well-studied MIT materials (mostly perovskite compounds) fall within a narrow region identified as high in probability based on these two features. Interestingly, for some materials for which the mechanism is largely unknown, (SmRu₄P₁₂ and PrRu₄P₁₂), but widely assumed to be different than that of most other MIT materials, these two features do not contribute to the MIT classification in any significant amount. This agrees with previous theoretical work, which has suggested that the relevant MIT physics for these materials is, in fact, not driven by the transition metal ion electrons at all, and is driven by the Pr and Sm f-electrons instead.⁶⁵

The SHAP feature importances further find the relevant features for the classification are the GII, the charge transfer energy, and the transition metal-ligand distances (see Figure S10 with SHAP plots for these two compounds in the SI), providing a possible hint as to the relevant physics in these materials. The GII and ADCR are two features that we find to be consistently important, and novel, in our model. The GII has previously been related to MIT temperatures in certain materials families, for example in the the $Rn$ Cu₃Fe₄O₁₂. ⁶⁶ Fig. 10 shows most MIT compounds exhibit ADCR values between $30\,\mathrm{pm}$ and $50\,\mathrm{pm}$ and GII values between 0.1 and 0.5. The moderate to high GII values for most MIT materials are consistent with our understanding that these thermally-driven MITs are assisted by a minor structural instability, which can alleviate bond stresses. Materials with high GII values may be too chemically unstable to support this type of mechanism. We find that the MIT compound with the lowest GII is V₂O₃. Overall, a low GII tends to favor an insulating state, and a higher GII favors a metallic or MIT state. For example, among binary oxides with rutile structure and composition $M$ O₂, we find $\mathrm{GII\,(TiO}_{2})=0.11$ , $\mathrm{GII\,(VO}_{2})=0.13$ , and $\mathrm{GII\,(MoO}_{2})=0.32$ . As most of our compounds are oxides, and most of those are insulators with a low GII, we deduce that materials with a low GII are highly stable from a bond-stress assessment and are unlikely MIT compounds, which is consistent with the GII SHAP data in Fig. 8.

We glean some understanding of the ADCR relevance by focusing on the perovskite $R$ NiO₃ nickelates with $R$ a rare-earth cation.²²2A similar analysis can also be done on other perovskite families such as $R$ CoO₃, or $Rn$ Cu₃Fe₄O₁₂, with $R$ defined as before and $Rn$ further includes Ho, Tb, Tm. In cubic perovskites, ABO₃, including the RNiO₃, the ADCR is linearly correlated to the Goldschmidt tolerance factor ${t}=({r_{R}+r_{\mathrm{O}}})/(\sqrt{2}(r_{B}+r_{\mathrm{O}}))$ . This tolerance factor is known to be associated with whether $B$ O₆ octahedral rotations are likely to occur and distort the ideal cubic perovskite structure ( $t=1$ )⁶⁷. For $t<1$ , the transition metal-oxygen octahedra rotate, making it more difficult for electrons to hop and favoring an MIT. Thus, a lower tolerance factor usually leads to a higher MIT temperature, while a higher $t$ can suppress MIT behavior altogether (e.g., LaNiO₃ is metallic). The ADCR plays a more important role in supporting the MIT classification for LuNiO₃ (lower ADCR) than it does for NdNiO₃ (higher ADCR), capturing the physical trend in the phase diagram in Fig. 11: the ADCR places NdNiO₃ close to metallic LaNiO₃, while it places LuNiO₃ significantly further away towards high $T_{\mathrm{MIT}}$ . This physics is captured in the SHAP values in Fig. 12: NdNiO₃ is one of the nickelates with the highest ADCR and the lowest $T_{\mathrm{MIT}}$ , making it close to the metallic class as identified by the model with a log-odds ratio of 5.08. In contrast, LuNiO₃ with an ADCR lower than NdNiO₃ has a 7.22 log-odds ratio. In other words, the classifier is more certain that LuNiO₃ is an MIT compound than it is for NdNiO₃. The ADCR is thus similar to a generalized tolerance factor, irrespective of the materials family studied. We note that the ADCR is also strongly correlated to the average deviation of electronegativity with a linear correlation of 0.919 (see Fig. 16), which we understand as a consequence of the electron affinity of an element being partially determined by its atomic radius.

Although we find the estimated Hubbard $U$ values $U_{0}^{\prime}$ and charge-transfer energies $\Delta_{0}$ , which are important to the ZSA classification of correlated metals and insulators, are among the top 8 features, the MIT materials do not strongly cluster when they are plotted in a 2D space consisting of these features (Fig. 13). The GII, ADCR, and the range of the Mendeleev number (discussed later) lead to much stronger class clustering (or separation) than the ZSA-classification energies $U_{0}^{\prime}$ and $\Delta_{0}$ , which have been used extensively over the last 30 years. The Hubbard $U$ is a strong counter-indicator of an MIT when large and somewhat less strongly predictive of an MIT when low, which gives some support to the findings in Ref. 52. The presence of materials with extremely high $U_{0}^{\prime}$ values arising from high ionization energies, e.g., titanates, distorts the SHAP color scale for the Estimated Hubbard $U$ row in Fig. 8. High values of $\Delta_{0}$ may also indicate that no MIT occurs, although sometimes the SHAP value of such a material is close to 0. The color scale for the charge transfer energy is also skewed by the presence of negative $\Delta_{0}$ values. These occur when the difference in metal and anion Madelung site potentials is small and/or the ionization energy of the metal is large. One reason that the ZSA classification energies may be difficult to interpret is that the energy estimates we use correspond to unscreened values. Dielectric screening and metal-ligand hybridization effects in solids can lead to significant renormalization of these values, but require electronic-structure based calculations such as cRPA ⁶⁸ to ascertain. Thus, although these numbers provide some information to our machine learning model, they are difficult to understand in isolation.

Two additional features of high importance include the average metal-metal distance and the average metal-nonmetal (metal-ligand) distance. We understand their role in relation to the energies $U_{0}^{\prime}$ and $\Delta_{0}$ within the ZSA framework, which are often scaled relative to the electron-hopping parameters describing correlated electron materials in the form of the d-orbital bandwidth in transition metal compounds. As the d-orbital bandwidth in the low-energy electronic structure is not directly available from the structure alone, a convenient proxy is the metal-metal distance and/or the metal-anion distance as the bandwidth is inversely proportional to the distance between the atomic pairs contributing states that hybridize by symmetry. This explains the strong role of inter-atomic distances in our model.

The Ewald energy reflects how stabilizing the ionic charge distribution in a crystal is due to the electrostatic potential imposed by oppositely charged ions in the atomic structure. The calculations as currently performed in the latest version of Matminer correspond to an electrostatic energy between ions modeled as point charges, with the charge approximated by the nominal ionic oxidation state. This feature, as implemented in Matminer has recently been updated to be normalized per atom. This Ewald energy per atom then, to first order, separates highly ionic materials from less ionic materials. Phosphates such as CoP₃ (which has an Ewald energy of $-124\,\mathrm{eV/atom}$ ) have strong ionic character in this picture, with P having a $3-$ charge, and Co a $9+$ charge, while sulfides tend to have lower absolute values of the Ewald energy (FeS₂ has an Ewald energy of $-10\,\mathrm{eV/atom}$ ). Fig. 15 shows oxides that exhibit intermediate values.

This separation based on the anion Ewald energy is similar to that based on the maximum Mendeleev number, which may explain why the maximum of the Mendeleev Number (which describes the anion in our compounds) does not have a role in our classification according to our SHAP scores whereas the range of the Mendeleev number is much more important. The range of the Mendeleev number essentially separates compounds that contain elements from the first three columns of the periodic table or from the lanthanide or actinide series (such as LaNiO₃ or EuO), from those that are binary transition metal compounds (such as FeS or NiO). The importance of this feature is clearer to discern through its interaction with other features, whereas the Ewald energy alone leads to a clear separation based on composition, particularly based on the anion type. From the combination of ADCR and the range of the Mendeleev number, we find strong clustering of MIT materials (Fig. 14). Particularly, a range of the Mendeleev number over 40 and an ADCR below 50 are likely an indicator of an MIT material. We also find that most known MIT compounds tend to be oxides containing yttrium or a lanthanide in their composition (as highlighted by the green dots enclosed in horizontal ellipses in Fig. 14).

Although the Ewald-energy-SHAP scores in Fig. 8 are difficult to interpret for the entire database as a whole, they are easier to understand if we focus on a particular family. Consider the V_nO_m binary vanadium oxide family. We find that V₂O₅ with an Ewald energy of $-57\,\mathrm{eV/atom}$ is insulating, while VO with an Ewald energy of $-24\,\mathrm{eV/atom}$ is metallic. The V₆O₁₃, VO₂, V₈O₁₅, V₆O₁₁, V₅O₉, V₄O₇, V₃O₅ and V₂O₃ vanadates exhibiting intermediate Ewald energies are all MIT compounds. This observation strongly suggests that the Ewald energy of the MIT compounds within a particular materials family is likely to lie in between that of the metallic and insulating members in that family. A similar analysis can be performed for the Ti_nO_m family (Fig. 15).

The Ewald energy is less useful, however, for differentiating materials with the same stoichiometry and structure type but comprised of different chemistry: LaNiO₃ has an Ewald energy of $-33\,\mathrm{eV/atom}$ and LuNiO₃ has an Ewald energy of $-34\,\mathrm{eV/atom}$ despite the very large differences between the two as illustrated in Fig. 11. As the materials classes have varying ranges of Ewald energy, it is then difficult to understand its role from the SHAP plot alone without focusing on a particular family.

Finally, we also included the average deviation of the electronegativity as a feature for our classifier. Although it is strongly linearly correlated with the ADCR, we surprisingly find that there is a strong clustering of the MIT compounds within this 2D feature space (Fig. 16).

3.3 Online Classifiers for Predicting Conductivity Classes

Our pre-trained electronic classifiers are deployed and served to the larger materials science community through a cloud service called Binder⁶⁹. There are several Jupyter notebooks⁷⁰ hosted on the Binder server that may be used to easily reproduce the results presented herein. The Binder website offers interactive execution of these notebooks directly in a web browser without installing any dependency onto a local machine. All notebooks are also available at Ref. 41.

Here we present a brief demonstration of the Binder notebook at https://tinyurl.com/mit-classifiers, which enables a user to upload a structure in CIF format and immediately make a classification using our models, without the need to install any software. After uploading a lacunar spinel structure GaMo₄Se₈, which was identified computationally as a potential MIT material⁹, the notebook automatically featurizes the new structure and makes a prediction. Then after executing through the three binary classifiers trained on the reduced feature set, GaMo₄Se₈ is classified as an MIT material (Fig. 17). Note among the three possible conductivity classes, the assigned classifications may not be mutually exclusive since a single ternary classification is not made. The classifiers predict the following probabilities: 0.4761 for being a metal; 0.0479 for being an insulator; 0.5441 for being an MIT compound. The default threshold for making a positive classification is 0.5, which means a positive classification is made only if the predicted probability is greater than 0.5. In this case, GaMo₄Se₈ is predicted only to be an MIT compound. However it is also worth noting that although GaMo₄Se₈ does not obtain a positive metal classification, its probability for being a metal is 0.4761, which is close to 0.5.

4 Conclusions

Within this work, we highlighted and attempted to resolve two important issues in the field of metal-insulator transition compounds, which are in fact quite general to the study of quantum materials more broadly. First, the lack of a widely accessible database of materials based on a particularly relevant - but rare - property. And second, a methodology to provide insight into this class of materials that is complementary to that of standard electronic structure and model calculation methods.

Our electronic materials database comprising of MIT compounds as well as related metals and insulators, will help broaden the domain knowledge of other scientists in the field. On this database, we trained three easy-to-interpret machine learning models. We recognized that the training data size is limited, and took measures to avoid over-fitting whenever possible. We offered a brief analysis on the robustness and extrapolation power of the MIT classifier in the SI. Based on the MIT classifier model, we identified new features that determine whether a material has a temperature-driven MIT or not, advancing our domain knowledge for this type of classification problem. Particularly, we found the Global Instability Index, Average Deviation of the Covalent Radius, Ewald Energy, and Range of the Mendeleev Number as well as combinations between pairs of these features to be important to the performance of the MIT classification model.

The high importance of the transition metal-transition metal distance, as well as its interaction with other features, such as the Hubbard $U$ , highlights both the ability of machine learning approaches to gain physical insight and confirms previous theories about the nature of the MIT in an unbiased way. MIT materials exhibited strong clustering when plotted in a 2D space spanned by two novel features, the average deviation of the covalent radius (ADCR) and the range of the Mendeleev number, making it possible to quickly assess whether novel materials discovered in the laboratory or predicted computationally may exhibit MITs. We also provided a periodic table with the Mendeleev number and covalent radius of the atoms to enable a quick calculation for other scientists of these two features. We conjecture that these features may be relevant in creating simple models analogous to the Goldhammer-Herzfeld criterion,⁷¹ which allows for the differentiation between elemental metals and nonmetals. Finally, we offered a simple-to-use online platform that allows users to upload a crystal structure file and obtain a probabilistic prediction on the electronic class of their material.

{suppinfo}

The Supporting Information is available free of charge on the ACS Publications website at DOI: the performance comparison of XGBoost classifiers (1) against other types of machine learning models (e.g., random forest) trained on the full feature set, (2) trained on the full feature set against those trained on the reduced feature set, and (3) trained in this work against other models trained in previous works to classify only metals and insulators; short primer on SHAP values; NLP search keywords; survey analysis comparing classification accuracy of domain experts against XGBoost classifiers; ALE analysis; element heatmaps for metals, insulators and MIT compounds; model evaluation with holdout test sets; and a brief discussion on the robustness and extrapolation power of the MIT classifier. The materials database and calculated features are available online at Ref. 41.

\acknowledgement

The authors thank Professors R. Seshadri and S. Wilson at the University of California, Santa Barbara, for helpful discussions about this project. This work was supported in part by the National Science Foundation (NSF) under award number DMR-1729303. The information, data, or work presented herein was also funded in part by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award Number DE-AR0001209. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

A.B.G. and P.R. contributed equally to this work, which was initiated by N.W. A.B.G. identified the handpicked features, performed the physical analysis of the results, the human identification and classification of the materials in the final database, and helped coordinate the project. P.R. built the final version of the classifier models, the online pipeline, and the featurizer used throughout the project. A.R.T. built the NLP pipeline used and identified relevant compounds from the pipeline to add to the materials database. S.Z. performed the ALE analysis. K.M. built the webpage for the materials database. D.A. supervised S.T. E.A.O. supervised A.R.T. J.M.R. conceived and administered the project. All authors contributed to writing and revising the paper.

References

Imada et al. 1998 Imada, M.; Fujimori, A.; Tokura, Y. Metal-insulator transitions. Reviews of Modern Physics 1998, 70, 1039–1263, DOI: 10.1103/RevModPhys.70.1039
Hsu et al. 2018 Hsu, Y.-T.; Li, X.; Deng, D.-L.; Das Sarma, S. Machine Learning Many-Body Localization: Search for the Elusive Nonergodic Metal. Phys. Rev. Lett. 2018, 121, 245701, DOI: 10.1103/PhysRevLett.121.245701
Vargas-Hernández et al. 2018 Vargas-Hernández, R. A.; Sous, J.; Berciu, M.; Krems, R. V. Extrapolating Quantum Observables with Machine Learning: Inferring Multiple Phase Transitions from Properties of a Single Phase. Physical Review Letters 2018, 121, 255702, DOI: 10.1103/PhysRevLett.121.255702
Dong et al. 2019 Dong, X.-Y.; Pollmann, F.; Zhang, X.-F. Machine learning of quantum phase transitions. Physical Review B 2019, 99, 121104, DOI: 10.1103/PhysRevB.99.121104
Shukla et al. 2015 Shukla, N.; Thathachary, A. V.; Agrawal, A.; Paik, H.; Aziz, A.; Schlom, D. G.; Gupta, S. K.; Engel-Herbert, R.; Datta, S. A steep-slope transistor based on abrupt electronic phase transition. Nature Communications 2015, 6, DOI: 10.1038/ncomms8812
Brahlek et al. 2017 Brahlek, M.; Zhang, L.; Lapano, J.; Zhang, H.-T.; Engel-Herbert, R.; Shukla, N.; Datta, S.; Paik, H.; Schlom, D. G. Opportunities in vanadium-based strongly correlated electron systems. MRS Communications 2017, 7, 27–52, DOI: 10.1557/mrc.2017.2
Cui et al. 2018 Cui, Y.; Ke, Y.; Liu, C.; Chen, Z.; Wang, N.; Zhang, L.; Zhou, Y.; Wang, S.; Gao, Y.; Long, Y. Thermochromic VO₂ for Energy-Efficient Smart Windows. Joule 2018, 2, 1707–1746, DOI: 10.1016/j.joule.2018.06.018
Hiroi 2015 Hiroi, Z. Structural instability of the rutile compounds and its relevance to the metal–insulator transition of VO₂. Progress in Solid State Chemistry 2015, 43, 47–69, DOI: 10.1016/j.progsolidstchem.2015.02.001
Wang et al. 2020 Wang, Y.; Iyer, A.; Chen, W.; Rondinelli, J. M. Featureless adaptive optimization accelerates functional electronic materials design. Applied Physics Reviews 2020, 7, 041403, DOI: 10.1063/5.0018811
Georgescu et al. 2019 Georgescu, A. B.; Peil, O. E.; Disa, A. S.; Georges, A.; Millis, A. J. Disentangling lattice and electronic contributions to the metal–insulator transition from bulk vs. layer confined RNiO3. Proceedings of the National Academy of Sciences 2019, 116, 14434–14439, DOI: 10.1073/pnas.1818728116
Domínguez et al. 2020 Domínguez, C.; Georgescu, A. B.; Mundet, B.; Zhang, Y.; Fowlie, J.; Mercy, A.; Waelchli, A.; Catalano, S.; Alexander, D. T. L.; Ghosez, P.; Georges, A.; Millis, A. J.; Gibert, M.; Triscone, J.-M. Length scales of interfacial coupling between metal and insulator phases in oxides. Nature Materials 2020, 19, 1182–1187, DOI: 10.1038/s41563-020-0757-x
Georgescu and Millis 2021 Georgescu, A. B.; Millis, A. J. Energy Landscape analysis of metal-insulator transitions: theory and application to Ca₂RuO₄, RNiO₃ and their heterostructures. 2021, arXiv.org. https://arxiv.org/abs/2105.02271. (accessed 2021-06-01).
Schueller et al. 2020 Schueller, E. C.; Miller, K. D.; Zhang, W.; Zuo, J. L.; Rondinelli, J. M.; Wilson, S. D.; Seshadri, R. Structural signatures of the insulator-to-metal transition in $\mathrm{Ba}{\mathrm{Co}}_{1-x}{\mathrm{Ni}}_{x}{\mathrm{S}}_{2}$ . Phys. Rev. Materials 2020, 4, 104401, DOI: 10.1103/PhysRevMaterials.4.104401
Laurita et al. 2019 Laurita, G.; Puggioni, D.; Hickox-Young, D.; Rondinelli, J. M.; Gaultois, M. W.; Page, K.; Lamontagne, L. K.; Seshadri, R. Uncorrelated Bi off-centering and the insulator-to-metal transition in ruthenium ${A}_{2}{\mathrm{Ru}}_{2}{\mathrm{O}}_{7}$ pyrochlores. Phys. Rev. Materials 2019, 3, 095003, DOI: 10.1103/PhysRevMaterials.3.095003
Georgescu et al. 2021 Georgescu, A. B.; Kim, M.; Ismail-Beigi, S. Boson Subsidiary Solver (BoSS) v1.1. Computer Physics Communications 2021, 265, 107991, DOI: https://doi.org/10.1016/j.cpc.2021.107991
Jager et al. 2017 Jager, M. F.; Ott, C.; Kraus, P. M.; Kaplan, C. J.; Pouse, W.; Marvel, R. E.; Haglund, R. F.; Neumark, D. M.; Leone, S. R. Tracking the insulator-to-metal phase transition in VO₂ with few-femtosecond extreme UV transient absorption spectroscopy. Proceedings of the National Academy of Sciences 2017, 114, 9558–9563, DOI: 10.1073/pnas.1707602114
Lee et al. 2018 Lee, D. et al. Isostructural metal-insulator transition in VO₂. Science 2018, 362, 1037–1040, DOI: 10.1126/science.aam9189
Wagner et al. 2018 Wagner, N.; Puggioni, D.; Rondinelli, J. M. Learning from Correlations Based on Local Structure: Rare-Earth Nickelates Revisited. Journal of Chemical Information and Modeling 2018, 58, 2491–2501, DOI: 10.1021/acs.jcim.8b00411, PMID: 30111111
Mercy et al. 2017 Mercy, A.; Bieder, J.; Inigues, J.; Ghosez, P. Structurally triggered metal-insulator transition in rare-earth nickelates. Nature Communications 2017, 8, 1677, DOI: 10.1038/s41467-017-01811-x
Peil et al. 2019 Peil, O. E.; Hampel, A.; Ederer, C.; Georges, A. Mechanism and control parameters of the coupled structural and metal-insulator transition in nickelates. Phys. Rev. B 2019, 99, 245127, DOI: 10.1103/PhysRevB.99.245127
Johnston et al. 2014 Johnston, S.; Mukherjee, A.; Elfimov, I.; Berciu, M.; Sawatzky, G. A. Charge Disproportionation without Charge Transfer in the Rare-Earth-Element Nickelates as a Possible Mechanism for the Metal-Insulator Transition. Phys. Rev. Lett. 2014, 112, 106404, DOI: 10.1103/PhysRevLett.112.106404
Fomichev et al. 2020 Fomichev, S.; Khaliullin, G.; Berciu, M. Effect of electron-lattice coupling on charge and magnetic order in rare-earth nickelates. Phys. Rev. B 2020, 101, 024402, DOI: 10.1103/PhysRevB.101.024402
Disa et al. 2017 Disa, A. S.; Georgescu, A. B.; Hart, J. L.; Kumah, D. P.; Shafer, P.; Arenholz, E.; Arena, D. A.; Ismail-Beigi, S.; Taheri, M. L.; Walker, F. J.; Ahn, C. H. Control of hidden ground-state order in $\mathrm{NdNi}{\mathrm{O}}_{3}$ superlattices. Phys. Rev. Materials 2017, 1, 024410, DOI: 10.1103/PhysRevMaterials.1.024410
Liao et al. 2018 Liao, Z. et al. Metal–insulator-transition engineering by modulation tilt-control in perovskite nickelates for room temperature optical switching. Proceedings of the National Academy of Sciences 2018, 115, 9515–9520, DOI: 10.1073/pnas.1807457115
Zhang et al. 2019 Zhang, J. et al. Nano-Resolved Current-Induced Insulator-Metal Transition in the Mott Insulator ${\mathrm{Ca}}_{2}{\mathrm{RuO}}_{4}$ . Phys. Rev. X 2019, 9, 011032, DOI: 10.1103/PhysRevX.9.011032
Okazaki et al. 2020 Okazaki, R.; Kobayashi, K.; Kumai, R.; Nakao, H.; Murakami, Y.; Nakamura, F.; Taniguchi, H.; Terasaki, I. Current-induced Giant Lattice Deformation in the Mott Insulator Ca₂RuO₄. Journal of the Physical Society of Japan 2020, 89, 044710, DOI: 10.7566/JPSJ.89.044710
Tsurumaki-Fukuchi et al. 2020 Tsurumaki-Fukuchi, A.; Tsubaki, K.; Katase, T.; Kamiya, T.; Arita, M.; Takahashi, Y. Stable and Tunable Current-Induced Phase Transition in Epitaxial Thin Films of Ca₂RuO₄. ACS Applied Materials & Interfaces 2020, 12, 28368–28374, DOI: 10.1021/acsami.0c05181, PMID: 32460482
Bertinshaw et al. 2019 Bertinshaw, J. et al. Unique Crystal Structure of ${\mathrm{Ca}}_{2}{\mathrm{RuO}}_{4}$ in the Current Stabilized Semimetallic State. Phys. Rev. Lett. 2019, 123, 137204, DOI: 10.1103/PhysRevLett.123.137204
Han and Millis 2018 Han, Q.; Millis, A. Lattice Energetics and Correlation-Driven Metal-Insulator Transitions: The Case of ${\mathrm{Ca}}_{2}{\mathrm{RuO}}_{4}$ . Phys. Rev. Lett. 2018, 121, 067601, DOI: 10.1103/PhysRevLett.121.067601
Periyasamy et al. 2020 Periyasamy, M.; Ã˜ystein S. FjellvÃ¥g,; FjellvÃ¥g, H.; SjÃ¥stad, A. O. Coupling of magnetoresistance switching and glassy magnetic state at the metalâ€“insulator transition in Ruddlesden-Popper manganite Ca₄Mn₃O₁₀. Journal of Magnetism and Magnetic Materials 2020, 511, 166949, DOI: https://doi.org/10.1016/j.jmmm.2020.166949
Szymanski et al. 2019 Szymanski, N. J.; Walters, L. N.; Puggioni, D.; Rondinelli, J. M. Design of Heteroanionic MoON Exhibiting a Peierls Metal-Insulator Transition. Phys. Rev. Lett. 2019, 123, 236402, DOI: 10.1103/PhysRevLett.123.236402
Bansal et al. 2020 Bansal, D.; Niedziela, J. L.; Calder, S.; Lanigan-Atkins, T.; Rawl, R.; Said, A. H.; Abernathy, D. L.; Kolesnikov, A. I.; Zhou, H.; Delaire, O. Magnetically driven phonon instability enables the metal–insulator transition in h-FeS. Nature Physics 2020, 16, 669–675, DOI: 10.1038/s41567-020-0857-1
Shannon and Fischer 2016 Shannon, R. D.; Fischer, R. X. Empirical electronic polarizabilities of ions for the prediction and interpretation of refractive indices: Oxides and oxysalts. American Mineralogist 2016, 101, 2288–2300, DOI: 10.2138/am-2016-5730, Publisher: GeoScienceWorld
Naccarato et al. 2019 Naccarato, F.; Ricci, F.; Suntivich, J.; Hautier, G.; Wirtz, L.; Rignanese, G.-M. Searching for materials with high refractive index and wide band gap: A first-principles high-throughput study. Phys. Rev. Materials 2019, 3, 044602, DOI: 10.1103/PhysRevMaterials.3.044602
Hellenbrandt 2004 Hellenbrandt, M. The Inorganic Crystal Structure Database (ICSD)–Present and Future. Crystallogr. Rev. 2004, 10, 17–22, DOI: 10.1080/08893110410001664882
Spr 2019 SpringerMaterials. 2019, https://materials.springer.com/ (accessed 2021-06-01)
Jain et al. 2013 Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; Persson, K. A. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials 2013, 1, 011002, DOI: 10.1063/1.4812323
Saal et al. 2013 Saal, J. E.; Kirklin, S.; Aykol, M.; Meredig, B.; Wolverton, C. Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD). JOM 2013, 65, 1501–1509, DOI: 10.1007/s11837-013-0755-4
Curtarolo et al. 2012 Curtarolo, S.; Setyawan, W.; Wang, S.; Xue, J.; Yang, K.; Taylor, R. H.; Nelson, L. J.; Hart, G. L.; Sanvito, S.; Buongiorno-Nardelli, M.; Mingo, N.; Levy, O. AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations. Computational Materials Science 2012, 58, 227–235, DOI: 10.1016/j.commatsci.2012.02.002
Varignon et al. 2019 Varignon, J.; Bibes, M.; Zunger, A. Origin of band gaps in 3d perovskite oxides. Nature Communications 2019, 10, DOI: 10.1038/s41467-019-09698-6
Ren et al. 2021 Ren, P.; Wagner, N.; Georgescu, A.; Rondinelli, J. M. Electronic Materials Binary Classifiers. 2021, https://doi.org/10.5281/zenodo.4765321
Ward et al. 2016 Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2016, 2, DOI: 10.1038/npjcompumats.2016.28
Peets et al. 2013 Peets, D. C.; Kim, J.-H.; Dosanjh, P.; Reehuis, M.; Maljuk, A.; Aliouane, N.; Ulrich, C.; Keimer, B. Magnetic phase diagram of Sr₃Fe₂O_7-δ. Phys. Rev. B 2013, 87, 214410, DOI: 10.1103/PhysRevB.87.214410
Jensen et al. 2019 Jensen, Z.; Kim, E.; Kwon, S.; Gani, T. Z. H.; RomÃ¡n-Leshkov, Y.; Moliner, M.; Corma, A.; Olivetti, E. A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction. ACS Central Science 2019, 5, 892–899, DOI: 10.1021/acscentsci.9b00193
Sekine et al. 1997 Sekine, C.; Uchiumi, T.; Shirotani, I.; Yagi, T. Metal-Insulator Transition in ${\mathrm{PrRu}}_{4}\mathrm{P}_{12}$ with Skutterudite Structure. Phys. Rev. Lett. 1997, 79, 3218–3221, DOI: 10.1103/PhysRevLett.79.3218
Tshitoyan et al. 2019 Tshitoyan, V.; Dagdelen, J.; Weston, L.; Dunn, A.; Rong, Z.; Kononova, O.; Persson, K. A.; Ceder, G.; Jain, A. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 2019, 571, 95–98, DOI: 10.1038/s41586-019-1335-8
Balachandran and Rondinelli 2013 Balachandran, P. V.; Rondinelli, J. M. Interplay of octahedral rotations and breathing distortions in charge-ordering perovskite oxides. Phys. Rev. B 2013, 88, 054101, DOI: 10.1103/PhysRevB.88.054101
Wagner et al. 2018 Wagner, N.; Puggioni, D.; Rondinelli, J. M. Learning from Correlations Based on Local Structure: Rare-Earth Nickelates Revisited. Journal of Chemical Information and Modeling 2018, 58, 2491–2501, DOI: 10.1021/acs.jcim.8b00411
Salinas-Sanchez et al. 1992 Salinas-Sanchez, A.; Garcia-Muñoz, J.; Rodriguez-Carvajal, J.; Saez-Puche, R.; Martinez, J. Structural characterization of R₂BaCuO₅ (R = Y, Lu, Yb, Tm, Er, Ho, Dy, Gd, Eu and Sm) oxides by X-ray and neutron diffraction. Journal of Solid State Chemistry 1992, 100, 201–211, DOI: 10.1016/0022-4596(92)90094-C
Ward et al. 2018 Ward, L. et al. Matminer: An open source toolkit for materials data mining. Computational Materials Science 2018, 152, 60–69, DOI: 10.1016/j.commatsci.2018.05.018
Zaanen et al. 1985 Zaanen, J.; Sawatzky, G. A.; Allen, J. W. Band gaps and electronic structure of transition-metal compounds. Phys. Rev. Lett. 1985, 55, 418–421, DOI: 10.1103/PhysRevLett.55.418
Torrance et al. 1991 Torrance, J. B.; Lacorre, P.; Asavaroengchai, C.; Metzger, R. M. Why are some oxides metallic, while most are insulating? Physica C: Superconductivity 1991, 182, 351–364, DOI: 10.1016/0921-4534(91)90534-6
Villars et al. 2004 Villars, P.; Cenzual, K.; Daams, J.; Chen, Y.; Iwata, S. Data-driven atomic environment prediction for binaries using the Mendeleev number. Journal of Alloys and Compounds 2004, 367, 167–175, DOI: 10.1016/j.jallcom.2003.08.060
George et al. 2020 George, J.; Waroquiers, D.; Stefano, D. D.; Petretto, G.; Rignanese, G.-M.; Hautier, G. The Limited Predictive Power of the Pauling Rules. Angewandte Chemie International Edition 2020, 59, 7569–7575, DOI: 10.1002/anie.202000829
Brown 1992 Brown, I. D. Chemical and steric constraints in inorganic solids. Acta Crystallographica Section B Structural Science 1992, 48, 553–572, DOI: 10.1107/s0108768192002453
Salinas-Sanchez et al. 1992 Salinas-Sanchez, A.; Garcia-Muñoz, J. L.; Rodriguez-Carvajal, J.; Saez-Puche, R.; Martinez, J. L. Structural Characterization of R₂BaCuO₅ (R = Y, Lu, Yb, Tm, Er, Ho, Dy, Gd, Eu and Sm) Oxides by X-Ray and Neutron Diffraction. J. Solid State Chem. 1992, 100, 201–211, DOI: 10.1016/0022-4596(92)90094-C
Brown and Poeppelmeier 2014 Brown, I. D., Poeppelmeier, K. R., Eds. Bond Valences; Springer Berlin Heidelberg, 2014; pp 91–128, DOI: 10.1007/978-3-642-54968-7
Hong et al. 2016 Hong, W. T.; Welsch, R. E.; Shao-Horn, Y. Descriptors of Oxygen-Evolution Activity for Oxides: A Statistical Evaluation. The Journal of Physical Chemistry C 2016, 120, 78–86, DOI: 10.1021/acs.jpcc.5b10071
Kramida et al. 2019 Kramida, A.; Ralchenko, Y.; Reader, J.; Team, N. A. NIST Atomic Spectra Database (version 5.7.1). 2019, https://physics.nist.gov/asd. (accessed 2021-06-01)
Chen and Guestrin 2016 Chen, T.; Guestrin, C. XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16. New York, New York, USA, 2016; pp 785–794, DOI: 10.1145/2939672.2939785
Olson et al. 2018 Olson, R. S.; La Cava, W.; Mustahsan, Z.; Varik, A.; Moore, J. H. Data-driven Advice for Applying Machine Learning to Bioinformatics Problems. 2018, arXiv.org. https://arxiv.org/abs/1708.05070. (accessed 2021-06-01).
McInnes et al. 2020 McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. 2018, arXiv.org. https://arxiv.org/abs/1802.03426. (accessed 2021-06-01).
Lundberg and Lee 2017 Lundberg, S. M.; Lee, S.-I. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc., 2017; pp 4765–4774
Pedregosa et al. 2011 Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2011, 12, 2825–2830
Curnoe et al. 2002 Curnoe, S.; Harima, H.; Takegahara, K.; Ueda, K. Structural phase transition and anti-quadrupolar ordering in PrFe₄P₁₂ and PrRu₄P₁₂. Physica B: Condensed Matter 2002, 312-313, 837–839, DOI: https://doi.org/10.1016/S0921-4526(01)01261-3, The International Conference on Strongly Correlated Electron Systems
YAMADA 2014 YAMADA, I. High-pressure synthesis, electronic states, and structure-property relationships of perovskite oxides, ACu₃Fe₄O₁₂ (A: divalent alkaline earth or trivalent rare-earth ion). Journal of the Ceramic Society of Japan 2014, 122, 846–851, DOI: 10.2109/jcersj2.122.846
Bartel et al. 2019 Bartel, C. J.; Sutton, C.; Goldsmith, B. R.; Ouyang, R.; Musgrave, C. B.; Ghiringhelli, L. M.; Scheffler, M. New tolerance factor to predict the stability of perovskite oxides and halides. Science Advances 2019, 5, eaav0693, DOI: 10.1126/sciadv.aav0693
Aryasetiawan et al. 2004 Aryasetiawan, F.; Imada, M.; Georges, A.; Kotliar, G.; Biermann, S.; Lichtenstein, A. I. Frequency-dependent local interactions and low-energy effective models from electronic structure calculations. Physical Review B - Condensed Matter and Materials Physics 2004, 70, 195104 (2004), DOI: 10.1103/PhysRevB.70.195104
Project Jupyter et al. 2018 Project Jupyter,; Matthias Bussonnier,; Jessica Forde,; Jeremy Freeman,; Brian Granger,; Tim Head,; Chris Holdgraf,; Kyle Kelley,; Gladys Nalvarte,; Andrew Osheroff,; Pacer, M.; Yuvi Panda,; Fernando Perez,; Benjamin Ragan Kelley,; Carol Willing, Binder 2.0 - Reproducible, interactive, sharable environments for science at scale. Proceedings of the 17th Python in Science Conference. 2018; pp 113 – 120, DOI: 10.25080/Majora-4af1f417-011
Kluyver et al. 2016 Kluyver, T.; Ragan-Kelley, B.; Pérez, F.; Granger, B.; Bussonnier, M.; Frederic, J.; Kelley, K.; Hamrick, J.; Grout, J.; Corlay, S.; Ivanov, P.; Avila, D.; Abdalla, S.; Willing, C. In Positioning and Power in Academic Publishing: Players, Agents and Agendas; Loizides, F., Schmidt, B., Eds.; 2016; pp 87 – 90, DOI: 10.3233/978-1-61499-649-1-87
Yao et al. 2020 Yao, B.; Kuznetsov, V. L.; Xiao, T.; Slocombe, D. R.; Rao, C.; Hensel, F.; Edwards, P. P. Metals and non-metals in the periodic table. Philosophical Transactions of the Royal Society A 2020, 378, DOI: 10.1098/rsta.2020.0213

Table of Contents Graphic