This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Feature space of XRD patterns constructed by auto-encorder

Keishu Utimula1, Masao Yano2, Hiroyuki Kimoto2, Kenta Hongo3,4,5, Kousuke Nakano6, Ryo Maezono6

1School of Materials Science, JAIST, Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan

2Toyota Motor Corporation, 1, Toyota-cho, Toyota, Aichi 471-8572, Japan

3Research Center for Advanced Computing Infrastructure, JAIST, Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan

4Center for Materials Research by Information Integration, Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan

5PRESTO, JST, Kawaguchi, Saitama 332-0012, Japan

6School of Information Science, JAIST, Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan
[email protected]
Abstract

It would be a natural expectation that only major peaks, not all of them, would make an important contribution to the characterization of the XRD pattern. We developed a scheme that can identify which peaks are relavant to what extent by using auto-encoder technique to construct a feature space for the XRD peak patterns. Individual XRD patterns are projected onto a single point in the two-dimensional feature space constructed using the method. If the point is significantly shifted when a peak of interest is masked, then we can say the peak is relevant for the characterization represented by the point on the space. In this way, we can formulate the relevancy quantitatively. By using this scheme, we actually found such a peak with a significant peak intensity but low relevancy in the characterization of the structure. The peak is not easily explained by the physical viewpoint such as the higher-order peaks from the same plane index, being a heuristic finding by the power of machine-learning.

I Introduction

Materials informatics (MI) becomes to form a major topic as an application of big data science for the materials discovery Ikebata et al. (2017); Oses et al. (2018). In the topic, some efforts are found to utilize auto-encoder technique, e.g., applying it to reduce noise in the Laser Induced Breakdown Spectroscopy (LIBS) spectra.  Ye et al. (2018) The auto encoder technique Hinton and Salakhutdinov (2006) was originally introduced as a method for compressing the dimensions of an image data using neural networks. Hinton and Salakhutdinov (2006) The technique has since been applied to extractions features of data Salakhutdinov and Hinton (2009), noise eliminations for image data Cho (2013), detections of signal anomalies Sakurada and Yairi (2014), and to data generations to form images etc. Huang et al. (2018)

One of the promising topics in the MI is the recognitions of the peak patterns of X-ray diffraction (XRD) analysis,  Kusne et al. (2014); Suram et al. (2017); Iwasaki et al. (2017); Li et al. (2018); Stanev et al. (2018); Xing et al. (2018); Utimula et al. (2018), which is common in crystal structure analysis in experiments Hongo et al. (2018) In terms of how to capture the features of patterns, all the previous studies above are based on the application of time-series signal recognition techniques. The Auto-encoder technique, on the other hand, takes a different approach to capture features: Putting features is regarded as a grouping operation made over all data, each of which has some data dimension (e.g., the number of data on time-axis). The grouping can be seen as a coarse-graining of information, and hence corresponds to the compression of data dimension by some means. In the auto-encoder concept, such a compression is realized by the neural network technique Hinton and Salakhutdinov (2006). There has been several recent reports applying the neural network to the problems in materials science  Dimiduk et al. (2018); Nash et al. (2018).

In the present study, we applied the auto-encoder technique using a neural network to the recognition of the XRD spectrum to extract features of peak patterns. The features are well captured on a two-dimensional space (feature space) from which we can reproduce the original data again precisely. The distance on the space can recognize the compositions of the doped compounds. Surprisingly the distance is also capable to be used to distinguish which peaks are relevant for capturing the features. This is quite in contrast to the traditional manner for researchers in materials science, interpreting the patterns from so called ’crystallographic viewpoints’, where the features are captured from multiple natures such as the Bravais lattice estimations (e.g., from Rietveld analysis Rietveld (1969)), indexing of each diffraction peak, fitting of lattice constants (e.g., Vegard’s law Vegard (1921); Denton and Ashcroft (1991)), and charge density estimations (e.g., from MEM analysis  Jauch and Palmer (1993); Wei (1985); Gull et al. (1987); Sakata and Sato (1990)). Though our achievement would just be described as ’a dimensional compression from original 11,900-dimensional into two-dimensional by an auto-encoder’ in a technical reporting manner, it would be a matter of surprising that the target problem which is traditionally considered from multiple view of human is now well captured only in two-dimensional by machines.

II Model and Methodology

We applied our framework to a data set of XRD patterns of magnetic alloys, [Sm(1-y)Zry] Fe12-xTix, with different concentrations (x,y)(x,y)Utimula et al. (2018) A fixed concentration still includes multiple possibilities on the inequivalent locations of substituent sites. Utimula et al. (2018) For 10 different concentrations considered here (See Table  1), we have 150 XRD patterns in total. All the XRD data are generated by simulations using density functional theory (DFT) which perform geometrical optimization being in quite well coincidence with experimental data. Utimula et al. (2018)

Table 1: The numbers of inequivalent configurations of Sm(1-y)ZryFe12-xTix to be considered. The numbers in brackets indicate the structures constructed from the 2×2×22\times 2\times 2 supercell (Sm/Zr), while the rest are constructed from the 2×2×12\times 2\times 1 supercell (Fe/Ti).
y\xy\backslash x 0.0 0.5 1.0 1.5 2.0
0.000 1 13 22 (1) 27 61
0.125 - - (2) - -
0.250 - - (7) - -
0.375 - - (6) - -
0.500 - - (10) - -

The data are give to a neural network as 11,900-dimensional vector components, {I(2θj)}j=111,900\left\{I(2\theta_{j})\right\}_{j=1}^{11,900} (2θ=0\theta=0\sim120 deg., δθ\delta\theta = 0.01 deg.) at the input layer. Note that the range of 2θ2\theta in our case corresponds to the experimental setting using synchrotron radiation (λ=0.496Å\lambda=0.496\AA). As an auto-encoder, the network encodes an input vector, compressing its dimension by hidden layers, finally to two-dimensional, and then decodes it toward the output-layer with the same dimension as that of input. The parameters in the network are optimized so that the output could reproduce the input identically. For the implementation, we used PyTorch Paszke et al. (2019) with activation functions, ReLU Glorot et al. (2011) for hidden layers both for the encoder and the decoder. At the final layer of encoder (decoder), tanh (linear function) is used as the activation function. Parameters are optimized by Adam Kingma and Ba (2014) algorithm combined with the error estimations by MSELoss(Mean Squared Error function) and the L2-norm weight decay. 90% of the data are used for the training and the rest are for the test. We found the optimal construction of the network (hyper-parameters) to minimize the mean error over ten samples for the numerical stabilization, being (the number of hidden layers) = 3, (Minibatch size) = 10, and (the number of epoch) = 1,000.

The network squeezes the dimension of data 11,900-dimensional at the input into 128 \to 64 \to 12-dimensional via three hidden layers and finally composes two-dimensional feature space at the final edge of encoder (namely, at the middle of the whole auto-encoder). Fig. 1 shows the feature space, on which 150 XRD patterns are projected to each point. The initial distribution of the points [panel (a)] gets to be scattered to form clusters as shown in the panel (b) as the learning in the network proceeds. We observe that each cluster [as shown by circles in the panel (b)] is formed by the patterns sharing the same concentration [corresponding to the same symbol] of substituents, and hence that the feature space could well recognize the compositions of samples.

Refer to caption
Figure 1: Distributions of 150 XRD patterns projected to two-dimensional feature space composed by our auto-encoder. The points with the same symbol share the same concentration of atomic substitutions but differs in the location of the substituents being inequivalent in group theoretical manner. Utimula et al. (2018) As the parameters are optimized (i.e., the learning of the neural network is completed), the initial distribution [panel (a)] gets to form clusters as shown in the panel (b), as enclosed by circles.

III Results and discussion

Fig. 2 shows the comparison between the input XRD pattern and the reproduced one by the auto-encoder. They coincide well each other at the level of human eyes resolution, namely that of the traditional analysis by materials science researchers. The result is hence ensuring that the data learning of XRD patterns on the auto-encoder is well achieved.

Refer to caption
Figure 2: The comparison between the input XRD pattern and the reproduced one by our auto-encoder. They coincide well each other at the level of human eyes resolution, and hence ensuring data learning on the auto-encoder being well achieved. Note that the range of 2θ2\theta in our case corresponds to the experimental setting using synchrotron radiation (λ=0.496Å\lambda=0.496\AA).

In the following discussions, we provide several possible ideas how to utilize the extracted features of XRD patterns, (A) identifications of doping concentrations for a given XRD pattern of unknown samples, (B) clarifying the irrelevancy of each peak in a pattern in contributing to the features. (C) generating artificial XRD patterns for a given concentration as the interpolation over XRD patterns to omit expensive ab initio analysis.

III.1 Identification of concentrations

Since in Fig. 1(b) the closer samples in the feature space have the closer concentration, it is quite likely for the samples with unknown concentrations to be projected to the location being closer to the sample having the closer concentration. One could then estimate the unknown concentration from the distance on the feature space.

For such an identification, we paint the feature space like Fig. 4 so that the color could correspond to the concentration. From the color to where a given sample is projected, we would be able to estimate the concentration of the sample. One would wonder the painting can be performed by the clustering technique (unsupervised machine-learning), such as k-means method. MacQueen (1967) However, it comes to the conclusion that at least k-means method is never the appropriate choice for the painting: the method relies on the concept of ’center of gravity’ of the data forming each cluster region. The data is sorted based on the distance from the center of gravity of each cluster. Such concept would work well if each cluster forms a simply-connected region, otherwise the closer distant points from the center might be out of the region. As seen in the solid circled region in Fig. 1(b), our data could form such clusters with the same feature being not simply-connected (triangle symbols form two regions separated each other). Rather than using global knowledge over the data such as the center of gravity, it is better to use local information in the vicinity of the projected point in the space, as concluded.

Refer to caption
Figure 3: Inference of unknown quantity for a given point PP in the feature space. It simply estimates the quantity QPQ_{P} by using the fraction of ss and tt measured by the distance in the space assuming the linear interpolation.

As a simplest implementation to use such a local information, we could use the linear interpolations, as explained in Fig. 3. Suppose the target sample (an XRD pattern) with an unknown property (concentration in the present case) is projected to PP, in the vicinity of which we could three data, AA, BB and CC, with known properties, QAQ_{A} etc.. When the location of PP is described as

xAP=sxAB+txAC,\displaystyle{{\vec{x}}_{AP}}=s\cdot{{\vec{x}}_{AB}}+t\cdot{{\vec{x}}_{AC}}\ , (1)

the quantity for PP is naively estimated using the same fractions as

QP=QA+s(QBQA)+t(QCQA).\displaystyle{Q_{P}}={Q_{A}}+s\cdot\left({{Q_{B}}-{Q_{A}}}\right)+t\cdot\left({{Q_{C}}-{Q_{A}}}\right)\ . (2)

For the way how to assign three known points in the vicinity of PP into AA, BB and CC, respectively, we would choose them so that the inner products (xABxAP)(\vec{x}_{AB}\cdot x_{AP}) and (xACxAP)(\vec{x}_{AC}\cdot x_{AP}) may get larger to obtain plausible interpolations (as is shown in Fig. 3). The condition may interpreted that PP should be located inside of the triangle formed by ACA\sim C, leading to the condition for the fractions, s,t>0s,t>0 and 1>s+t>01>s+t>0, as implemented in a program.

By sweeping PP over everywhere in the feature space with picking up three nearest points, we can estimate the concentration of any point on the space by using the above formalism, getting the painted map as shown in Fig. 4.

Refer to caption
Figure 4: Estimations of the substitutional concentrations for any points on the feature space represented by color maps. By using such maps, we can identify the concentration for a given sample to be projected on a point in the space.

By using this map, we can identify the concentration for a given sample to be projected on a point in the space. Using test data points (for which the concentration is known), we can examine the performance of the prediction by the map by comparing the known answer to the estimation. Table. 2 shows the list of the results in order of the poorest grades in the error which is defined for a concentration Smc1{}_{c_{1}}Zrc2{}_{c_{2}}Fec3{}_{c_{3}}Tic4{}_{c_{4}} between the answer (A) and the estimation (E) as

δ=α=14(cα(E)cα(A))2.\delta=\sum\limits_{\alpha=1}^{4}{{{\left({c_{\alpha}^{\left(E\right)}-c_{\alpha}^{\left(A\right)}}\right)}^{2}}}\ . (3)
Table 2: [estimation] The list of the estimation performance of sample concentrations for test data using our linear estimation, shown in order of the poorest grades in the error which is defined in Eq. (3) [The worst ten is shown]. The worst score is coming from the fact that there is little leaning data for this case as explained in the main text.
True composition Estimated composition Error
Sm1.000Zr0.000Fe11.000Ti1.000 Sm1.000Zr0.000Fe10.733Ti1.267 1.43×1011.43\times 10^{-1}
Sm1.000Zr0.000Fe11.000Ti1.000 Sm1.000Zr0.000Fe10.958Ti1.042 3.59×1033.59\times 10^{-3}
Sm1.000Zr0.000Fe10.500Ti1.500 Sm1.000Zr0.000Fe10.522Ti1.478 9.85×1049.85\times 10^{-4}
Sm1.000Zr0.000Fe11.000Ti1.000 Sm1.000Zr0.000Fe11.022Ti0.978 9.31×1049.31\times 10^{-4}
Sm1.000Zr0.000Fe10.500Ti1.500 Sm1.000Zr0.000Fe10.490Ti1.510 1.85×1041.85\times 10^{-4}
Sm1.000Zr0.000Fe11.500Ti0.500 Sm0.999Zr0.001Fe11.494Ti0.506 7.39×1057.39\times 10^{-5}
Sm1.000Zr0.000Fe11.500Ti0.500 Sm1.001Zr-0.001Fe11.505Ti0.495 4.80×1054.80\times 10^{-5}
Sm0.625Zr0.375Fe11.000Ti1.000 Sm0.623Zr0.377Fe11.000Ti1.000 1.03×1051.03\times 10^{-5}
Sm1.000Zr0.000Fe10.500Ti1.500 Sm1.000Zr0.000Fe10.501Ti1.499 4.26×1064.26\times 10^{-6}
Sm0.750Zr0.250Fe11.000Ti1.000 Sm0.749Zr0.251Fe11.000Ti1.000 1.20×1061.20\times 10^{-6}

Looking at the errors shown in Table. 2, the achievement of the prediction is fairly well, getting around less than 0.5% in general except one with the worst accuracy (δworst(1)\delta_{\rm worst}^{(1)} = 0.14). We could find out that the case is attributed to the shortage of the available data, which is the common problem when the machine-learning is applied to materials science. In the present case, the available number is completely specified by the number of elements for an equivalent location of substituents in the space group theoretical manner,  Utimula et al. (2019) without further variations because the XRD patterns in our study are all generated by zero-temperature ab initio simulations. For more practical and realistic applications to be compared with experimental data, one would generate further samples with more variety using, e.g. molecular dynamics at finite temperature starting from the optimized geometry at T=0T=0 to put thermal fluctuations.

The general consensus is that it is impossible to make sense of what is meant by a quantity of vertical/horizontal axis in such a compressed two-dimensional feature space. In the present case, however, it may be possible to trace the meaning to some extent by the following way: The features in the present XRD case are anyway those characterizing crystal structures tuned by the substituents, e.g., the c/ac/a ratio etc. It is straightforward to generate such variations to capture such features artificially by the simulations. By observing how the locations of the projected points on the feature space are affected by such artificial/typical change in the structure, we could extract the trend what the vertical/horizontal axis represents.

III.2 Identifying relevant peaks

It would be a fundamental question related to XRD that ”Do we really need all the peaks to characterize the structure with XRD patterns? Only a specific bandwidth of 2θ2\theta matters?” In this context, we would like to identify how much each peak is relevant in characterizing the feature of XRD. The most naive idea to measure the relevancy would be to find out disappeared peaks after an XRD is reproduced by the auto-encoder by comparing between input and output patterns. Such an idea is not working unfortunately because, as seen in Fig. 2, the optimization of our neural network is so successful that the output can reproduce an input pattern very well (no such disappeared peaks found).

We can instead take such an idea that the relevancy would be measured by how much the projected location on the feature space is affected when the peak considered is masked. Here we define a mask vector M(2θ)M(2\theta) having the same dimension as XRD’s as its components being zero for 2θ2θ+Δ2\theta\sim 2\theta+\Delta (Δ=0.03)\left(\Delta=0.03\right) and one for otherwise. By using the mask vector, the masked XRD pattern can then be represented as XRDM(2θ):=M(2θ)XRD{\rm XRD}_{M}(2\theta):=M(2\theta)\circ{\rm XRD}, where the symbol \circ stands for a Hadamard product. Suppose 𝒫[XRD]\mathcal{P}\left[{\rm XRD}\right] to be the projected XRD onto the feature space, the displacement on the feature space caused by masking can be evaluated as a normal distance on the feature space: Dev(2θ)=|𝒫[XRDM(2θ)]𝒫[XRD0]|{\rm Dev}(2\theta)=\left|\mathcal{P}\left[{\rm XRD}_{M}(2\theta)\right]-\mathcal{P}\left[{\rm XRD}_{0}\right]\right|. This would measure how much the masked peak at 2θ2\theta affects the location on the feature space, and hence correspond to the ’intensity of relevancy’. The plot of the intensity, Dev(2θ){\rm Dev}(2\theta), is shown in Fig. 5, superimposed on the pattern of XRD. This XRD pattern is for test data, the composition is Sm1.0Zr0.0Fe10.5Ti1.5.

Refer to caption
Figure 5: The intensity of the relevance of each peak as defined by Dev(2θ){\rm Dev}(2\theta) (open circles) compared with the original XRD pattern (solid line). This XRD pattern is for test data, the composition is Sm1.0Zr0.0Fe10.5Ti1.5. As a natural expectation, they are well correlating but at some peaks [e.g., at (a)], they are not.

As a natural expectation, both the original XRD and the relevancy are well correlating (namely the relevancy takes the large value when the XRD intensity gets larger), but at some peaks [e.g., at (a) in Fig. 5], they are not. Though the peak at (a) has significant intensity, its relevance is found to be almost zero. One might come to the idea why this might happen that such a peak might come from a higher-order plane index, and then has some intensity but the relevant information is already reflected to the lower-order ones. We found, however, that the peak (a) has a fundamental index, (0,0,1), being not a higher-order. The finding of (a) is hence a sort of ”beyond the human’s expectation”, and can be said to be the knowledge that can only be obtained by machine-learning.

III.3 interpolation of XRD patterns

XRD provides inferences not only of the lattice constants but also of other valuable information such as strains, crystalline sizes etc. It is, therefore, desirable that learning data to be compared with given experimental XRD have finer resolution in the composition as possible. Realizing such a finer resolution by ab initio simulations is, however, generally very difficult. Conventional treatment of atomic substitutions using supercell model requires a very large supercell to represent tiny percentage of substitutions, which is practically impossible to be performed. That’s how we came up with the idea of ”interpolating” discrete points (50%, 25%, 12.5%, etc.) that are feasible at realistic cost in ab initio simulations. Such an conceptual idea can actually be realized by our feature space. Since a contour for composition is constructed on the feature space as in Fig. 4, we can interpolate to find a point corresponding to a desired composition on the contour. The output pattern generated by the auto-encoder projected from the point on the feature space could be a plausible XRD pattern for the composition, that is obtained without costly ab initio simulations. This approach can be applied not only to XRD but also to other spectrum and even to other physical quantities to be evaluated by ab initio simulations. The approach would also be regarded as another remedy to overcome the long-standing difficulties in ab initio calculations, i.e., the problem of the computational cost for treating tiny concentrations with larger supercell models.

IV Conclusion

We developed an auto-encoder to form a feature space describing XRD patterns. The framework has well been trained by the data to reproduce input peak patterns at the level of human eye’s recognition. Each XRD is projected to a point on the two-dimensional feature space, forming clusters with concentrations being closer each other. The distance in the space can be used to estimate the concentrations of any given samples of XRD by projecting it on the space. We could compose a contour map on the space describing the concentration by using a linear interpolation connecting between the training data. By examining the prediction performance using test data, we confirmed that it achieves around less than several percent except the cases with little training data. We proposed a couple of interesting applications of the feature space: The way to identify the relevancy for each peak on characterizing XRD features is proposed. The idea is implemented as the observation of location shifts of a point on the feature space when a concerned peak is masked from the original XRD pattern. By this method, we found a non-trivial case with a peak having a considerable intensity but little relevancy for which we could not make reasonable account for the tiny relevancy from the physics viewpoint (e.g., higher-order reflections etc.) As another application, we proposed how to interpolate XRD patterns to avoid expensive ab initio simulations with the difficulty to handle tiny changes in concentrations. The interpolation can be made on the feature space and hence the auto-encoder can generate an artificial but plausible XRD pattern for the interpolated point with desired composition. The approach would be regarded as a useful remedy to achieve finner resolutions of concentrations with avoiding the computational cost when handled by ab initio simulations.

V Acknowledgments

The computations in this work have been performed using the facilities of Research Center for Advanced Computing Infrastructure at JAIST. R.M. is grateful for financial supports from MEXT-KAKENHI (19H04692 and 16KK0097), from FLAGSHIP2020 (project nos. hp190169 and hp190167 at K-computer), from Toyota Motor Corporation, from I-O DATA Foundation, from the Air Force Office of Scientific Research (AFOSR-AOARD/FA2386-17-1-4049;FA2386-19-1-4015), and from JSPS Bilateral Joint Projects (with India DST). The X-ray diffraction (XRD) measurements for the Sm-Fe-Ti system were performed at beamline BL02B2 of SPring-8 under the proposal No. 2017B1634.

References