Feature space of XRD patterns constructed by auto-encorder
Abstract
It would be a natural expectation that only major peaks, not all of them, would make an important contribution to the characterization of the XRD pattern. We developed a scheme that can identify which peaks are relavant to what extent by using auto-encoder technique to construct a feature space for the XRD peak patterns. Individual XRD patterns are projected onto a single point in the two-dimensional feature space constructed using the method. If the point is significantly shifted when a peak of interest is masked, then we can say the peak is relevant for the characterization represented by the point on the space. In this way, we can formulate the relevancy quantitatively. By using this scheme, we actually found such a peak with a significant peak intensity but low relevancy in the characterization of the structure. The peak is not easily explained by the physical viewpoint such as the higher-order peaks from the same plane index, being a heuristic finding by the power of machine-learning.
I Introduction
Materials informatics (MI) becomes to form a major topic as an application of big data science for the materials discovery Ikebata et al. (2017); Oses et al. (2018). In the topic, some efforts are found to utilize auto-encoder technique, e.g., applying it to reduce noise in the Laser Induced Breakdown Spectroscopy (LIBS) spectra. Ye et al. (2018) The auto encoder technique Hinton and Salakhutdinov (2006) was originally introduced as a method for compressing the dimensions of an image data using neural networks. Hinton and Salakhutdinov (2006) The technique has since been applied to extractions features of data Salakhutdinov and Hinton (2009), noise eliminations for image data Cho (2013), detections of signal anomalies Sakurada and Yairi (2014), and to data generations to form images etc. Huang et al. (2018)
One of the promising topics in the MI is the recognitions of the peak patterns of X-ray diffraction (XRD) analysis, Kusne et al. (2014); Suram et al. (2017); Iwasaki et al. (2017); Li et al. (2018); Stanev et al. (2018); Xing et al. (2018); Utimula et al. (2018), which is common in crystal structure analysis in experiments Hongo et al. (2018) In terms of how to capture the features of patterns, all the previous studies above are based on the application of time-series signal recognition techniques. The Auto-encoder technique, on the other hand, takes a different approach to capture features: Putting features is regarded as a grouping operation made over all data, each of which has some data dimension (e.g., the number of data on time-axis). The grouping can be seen as a coarse-graining of information, and hence corresponds to the compression of data dimension by some means. In the auto-encoder concept, such a compression is realized by the neural network technique Hinton and Salakhutdinov (2006). There has been several recent reports applying the neural network to the problems in materials science Dimiduk et al. (2018); Nash et al. (2018).
In the present study, we applied the auto-encoder technique using a neural network to the recognition of the XRD spectrum to extract features of peak patterns. The features are well captured on a two-dimensional space (feature space) from which we can reproduce the original data again precisely. The distance on the space can recognize the compositions of the doped compounds. Surprisingly the distance is also capable to be used to distinguish which peaks are relevant for capturing the features. This is quite in contrast to the traditional manner for researchers in materials science, interpreting the patterns from so called ’crystallographic viewpoints’, where the features are captured from multiple natures such as the Bravais lattice estimations (e.g., from Rietveld analysis Rietveld (1969)), indexing of each diffraction peak, fitting of lattice constants (e.g., Vegard’s law Vegard (1921); Denton and Ashcroft (1991)), and charge density estimations (e.g., from MEM analysis Jauch and Palmer (1993); Wei (1985); Gull et al. (1987); Sakata and Sato (1990)). Though our achievement would just be described as ’a dimensional compression from original 11,900-dimensional into two-dimensional by an auto-encoder’ in a technical reporting manner, it would be a matter of surprising that the target problem which is traditionally considered from multiple view of human is now well captured only in two-dimensional by machines.
II Model and Methodology
We applied our framework to a data set of XRD patterns of magnetic alloys, [Sm(1-y)Zry] Fe12-xTix, with different concentrations . Utimula et al. (2018) A fixed concentration still includes multiple possibilities on the inequivalent locations of substituent sites. Utimula et al. (2018) For 10 different concentrations considered here (See Table 1), we have 150 XRD patterns in total. All the XRD data are generated by simulations using density functional theory (DFT) which perform geometrical optimization being in quite well coincidence with experimental data. Utimula et al. (2018)
0.0 | 0.5 | 1.0 | 1.5 | 2.0 | |
---|---|---|---|---|---|
0.000 | 1 | 13 | 22 (1) | 27 | 61 |
0.125 | - | - | (2) | - | - |
0.250 | - | - | (7) | - | - |
0.375 | - | - | (6) | - | - |
0.500 | - | - | (10) | - | - |
The data are give to a neural network as 11,900-dimensional vector components, (2120 deg., = 0.01 deg.) at the input layer. Note that the range of in our case corresponds to the experimental setting using synchrotron radiation (). As an auto-encoder, the network encodes an input vector, compressing its dimension by hidden layers, finally to two-dimensional, and then decodes it toward the output-layer with the same dimension as that of input. The parameters in the network are optimized so that the output could reproduce the input identically. For the implementation, we used PyTorch Paszke et al. (2019) with activation functions, ReLU Glorot et al. (2011) for hidden layers both for the encoder and the decoder. At the final layer of encoder (decoder), tanh (linear function) is used as the activation function. Parameters are optimized by Adam Kingma and Ba (2014) algorithm combined with the error estimations by MSELoss(Mean Squared Error function) and the L2-norm weight decay. 90% of the data are used for the training and the rest are for the test. We found the optimal construction of the network (hyper-parameters) to minimize the mean error over ten samples for the numerical stabilization, being (the number of hidden layers) = 3, (Minibatch size) = 10, and (the number of epoch) = 1,000.
The network squeezes the dimension of data 11,900-dimensional at the input into 128 64 12-dimensional via three hidden layers and finally composes two-dimensional feature space at the final edge of encoder (namely, at the middle of the whole auto-encoder). Fig. 1 shows the feature space, on which 150 XRD patterns are projected to each point. The initial distribution of the points [panel (a)] gets to be scattered to form clusters as shown in the panel (b) as the learning in the network proceeds. We observe that each cluster [as shown by circles in the panel (b)] is formed by the patterns sharing the same concentration [corresponding to the same symbol] of substituents, and hence that the feature space could well recognize the compositions of samples.

III Results and discussion
Fig. 2 shows the comparison between the input XRD pattern and the reproduced one by the auto-encoder. They coincide well each other at the level of human eyes resolution, namely that of the traditional analysis by materials science researchers. The result is hence ensuring that the data learning of XRD patterns on the auto-encoder is well achieved.

In the following discussions, we provide several possible ideas how to utilize the extracted features of XRD patterns, (A) identifications of doping concentrations for a given XRD pattern of unknown samples, (B) clarifying the irrelevancy of each peak in a pattern in contributing to the features. (C) generating artificial XRD patterns for a given concentration as the interpolation over XRD patterns to omit expensive ab initio analysis.
III.1 Identification of concentrations
Since in Fig. 1(b) the closer samples in the feature space have the closer concentration, it is quite likely for the samples with unknown concentrations to be projected to the location being closer to the sample having the closer concentration. One could then estimate the unknown concentration from the distance on the feature space.
For such an identification, we paint the feature space like Fig. 4 so that the color could correspond to the concentration. From the color to where a given sample is projected, we would be able to estimate the concentration of the sample. One would wonder the painting can be performed by the clustering technique (unsupervised machine-learning), such as k-means method. MacQueen (1967) However, it comes to the conclusion that at least k-means method is never the appropriate choice for the painting: the method relies on the concept of ’center of gravity’ of the data forming each cluster region. The data is sorted based on the distance from the center of gravity of each cluster. Such concept would work well if each cluster forms a simply-connected region, otherwise the closer distant points from the center might be out of the region. As seen in the solid circled region in Fig. 1(b), our data could form such clusters with the same feature being not simply-connected (triangle symbols form two regions separated each other). Rather than using global knowledge over the data such as the center of gravity, it is better to use local information in the vicinity of the projected point in the space, as concluded.

As a simplest implementation to use such a local information, we could use the linear interpolations, as explained in Fig. 3. Suppose the target sample (an XRD pattern) with an unknown property (concentration in the present case) is projected to , in the vicinity of which we could three data, , and , with known properties, etc.. When the location of is described as
(1) |
the quantity for is naively estimated using the same fractions as
(2) |
For the way how to assign three known points in the vicinity of into , and , respectively, we would choose them so that the inner products and may get larger to obtain plausible interpolations (as is shown in Fig. 3). The condition may interpreted that should be located inside of the triangle formed by , leading to the condition for the fractions, and , as implemented in a program.
By sweeping over everywhere in the feature space with picking up three nearest points, we can estimate the concentration of any point on the space by using the above formalism, getting the painted map as shown in Fig. 4.

By using this map, we can identify the concentration for a given sample to be projected on a point in the space. Using test data points (for which the concentration is known), we can examine the performance of the prediction by the map by comparing the known answer to the estimation. Table. 2 shows the list of the results in order of the poorest grades in the error which is defined for a concentration SmZrFeTi between the answer (A) and the estimation (E) as
(3) |
True composition | Estimated composition | Error |
---|---|---|
Sm1.000Zr0.000Fe11.000Ti1.000 | Sm1.000Zr0.000Fe10.733Ti1.267 | |
Sm1.000Zr0.000Fe11.000Ti1.000 | Sm1.000Zr0.000Fe10.958Ti1.042 | |
Sm1.000Zr0.000Fe10.500Ti1.500 | Sm1.000Zr0.000Fe10.522Ti1.478 | |
Sm1.000Zr0.000Fe11.000Ti1.000 | Sm1.000Zr0.000Fe11.022Ti0.978 | |
Sm1.000Zr0.000Fe10.500Ti1.500 | Sm1.000Zr0.000Fe10.490Ti1.510 | |
Sm1.000Zr0.000Fe11.500Ti0.500 | Sm0.999Zr0.001Fe11.494Ti0.506 | |
Sm1.000Zr0.000Fe11.500Ti0.500 | Sm1.001Zr-0.001Fe11.505Ti0.495 | |
Sm0.625Zr0.375Fe11.000Ti1.000 | Sm0.623Zr0.377Fe11.000Ti1.000 | |
Sm1.000Zr0.000Fe10.500Ti1.500 | Sm1.000Zr0.000Fe10.501Ti1.499 | |
Sm0.750Zr0.250Fe11.000Ti1.000 | Sm0.749Zr0.251Fe11.000Ti1.000 |
Looking at the errors shown in Table. 2, the achievement of the prediction is fairly well, getting around less than 0.5% in general except one with the worst accuracy ( = 0.14). We could find out that the case is attributed to the shortage of the available data, which is the common problem when the machine-learning is applied to materials science. In the present case, the available number is completely specified by the number of elements for an equivalent location of substituents in the space group theoretical manner, Utimula et al. (2019) without further variations because the XRD patterns in our study are all generated by zero-temperature ab initio simulations. For more practical and realistic applications to be compared with experimental data, one would generate further samples with more variety using, e.g. molecular dynamics at finite temperature starting from the optimized geometry at to put thermal fluctuations.
The general consensus is that it is impossible to make sense of what is meant by a quantity of vertical/horizontal axis in such a compressed two-dimensional feature space. In the present case, however, it may be possible to trace the meaning to some extent by the following way: The features in the present XRD case are anyway those characterizing crystal structures tuned by the substituents, e.g., the ratio etc. It is straightforward to generate such variations to capture such features artificially by the simulations. By observing how the locations of the projected points on the feature space are affected by such artificial/typical change in the structure, we could extract the trend what the vertical/horizontal axis represents.
III.2 Identifying relevant peaks
It would be a fundamental question related to XRD that ”Do we really need all the peaks to characterize the structure with XRD patterns? Only a specific bandwidth of matters?” In this context, we would like to identify how much each peak is relevant in characterizing the feature of XRD. The most naive idea to measure the relevancy would be to find out disappeared peaks after an XRD is reproduced by the auto-encoder by comparing between input and output patterns. Such an idea is not working unfortunately because, as seen in Fig. 2, the optimization of our neural network is so successful that the output can reproduce an input pattern very well (no such disappeared peaks found).
We can instead take such an idea that the relevancy would be measured by how much the projected location on the feature space is affected when the peak considered is masked. Here we define a mask vector having the same dimension as XRD’s as its components being zero for and one for otherwise. By using the mask vector, the masked XRD pattern can then be represented as , where the symbol stands for a Hadamard product. Suppose to be the projected XRD onto the feature space, the displacement on the feature space caused by masking can be evaluated as a normal distance on the feature space: . This would measure how much the masked peak at affects the location on the feature space, and hence correspond to the ’intensity of relevancy’. The plot of the intensity, , is shown in Fig. 5, superimposed on the pattern of XRD. This XRD pattern is for test data, the composition is Sm1.0Zr0.0Fe10.5Ti1.5.

As a natural expectation, both the original XRD and the relevancy are well correlating (namely the relevancy takes the large value when the XRD intensity gets larger), but at some peaks [e.g., at (a) in Fig. 5], they are not. Though the peak at (a) has significant intensity, its relevance is found to be almost zero. One might come to the idea why this might happen that such a peak might come from a higher-order plane index, and then has some intensity but the relevant information is already reflected to the lower-order ones. We found, however, that the peak (a) has a fundamental index, (0,0,1), being not a higher-order. The finding of (a) is hence a sort of ”beyond the human’s expectation”, and can be said to be the knowledge that can only be obtained by machine-learning.
III.3 interpolation of XRD patterns
XRD provides inferences not only of the lattice constants but also of other valuable information such as strains, crystalline sizes etc. It is, therefore, desirable that learning data to be compared with given experimental XRD have finer resolution in the composition as possible. Realizing such a finer resolution by ab initio simulations is, however, generally very difficult. Conventional treatment of atomic substitutions using supercell model requires a very large supercell to represent tiny percentage of substitutions, which is practically impossible to be performed. That’s how we came up with the idea of ”interpolating” discrete points (50%, 25%, 12.5%, etc.) that are feasible at realistic cost in ab initio simulations. Such an conceptual idea can actually be realized by our feature space. Since a contour for composition is constructed on the feature space as in Fig. 4, we can interpolate to find a point corresponding to a desired composition on the contour. The output pattern generated by the auto-encoder projected from the point on the feature space could be a plausible XRD pattern for the composition, that is obtained without costly ab initio simulations. This approach can be applied not only to XRD but also to other spectrum and even to other physical quantities to be evaluated by ab initio simulations. The approach would also be regarded as another remedy to overcome the long-standing difficulties in ab initio calculations, i.e., the problem of the computational cost for treating tiny concentrations with larger supercell models.
IV Conclusion
We developed an auto-encoder to form a feature space describing XRD patterns. The framework has well been trained by the data to reproduce input peak patterns at the level of human eye’s recognition. Each XRD is projected to a point on the two-dimensional feature space, forming clusters with concentrations being closer each other. The distance in the space can be used to estimate the concentrations of any given samples of XRD by projecting it on the space. We could compose a contour map on the space describing the concentration by using a linear interpolation connecting between the training data. By examining the prediction performance using test data, we confirmed that it achieves around less than several percent except the cases with little training data. We proposed a couple of interesting applications of the feature space: The way to identify the relevancy for each peak on characterizing XRD features is proposed. The idea is implemented as the observation of location shifts of a point on the feature space when a concerned peak is masked from the original XRD pattern. By this method, we found a non-trivial case with a peak having a considerable intensity but little relevancy for which we could not make reasonable account for the tiny relevancy from the physics viewpoint (e.g., higher-order reflections etc.) As another application, we proposed how to interpolate XRD patterns to avoid expensive ab initio simulations with the difficulty to handle tiny changes in concentrations. The interpolation can be made on the feature space and hence the auto-encoder can generate an artificial but plausible XRD pattern for the interpolated point with desired composition. The approach would be regarded as a useful remedy to achieve finner resolutions of concentrations with avoiding the computational cost when handled by ab initio simulations.
V Acknowledgments
The computations in this work have been performed using the facilities of Research Center for Advanced Computing Infrastructure at JAIST. R.M. is grateful for financial supports from MEXT-KAKENHI (19H04692 and 16KK0097), from FLAGSHIP2020 (project nos. hp190169 and hp190167 at K-computer), from Toyota Motor Corporation, from I-O DATA Foundation, from the Air Force Office of Scientific Research (AFOSR-AOARD/FA2386-17-1-4049;FA2386-19-1-4015), and from JSPS Bilateral Joint Projects (with India DST). The X-ray diffraction (XRD) measurements for the Sm-Fe-Ti system were performed at beamline BL02B2 of SPring-8 under the proposal No. 2017B1634.
References
- Ikebata et al. (2017) H. Ikebata, K. Hongo, T. Isomura, R. Maezono, and R. Yoshida, Journal of computer-aided molecular design 31, 379 (2017).
- Oses et al. (2018) C. Oses, E. Gossett, D. Hicks, F. Rose, M. J. Mehl, E. Perim, I. Takeuchi, S. Sanvito, M. Scheffler, Y. Lederer, O. Levy, C. Toher, and S. Curtarolo, Journal of Chemical Information and Modeling 58, 2477 (2018), pMID: 30188699, https://doi.org/10.1021/acs.jcim.8b00393 .
- Ye et al. (2018) S. Ye, Z. Niu, P. Yang, and J. Sun, in 2018 Chinese Control And Decision Conference (CCDC) (2018) pp. 3572–3577.
- Hinton and Salakhutdinov (2006) G. E. Hinton and R. R. Salakhutdinov, Science 313, 504 (2006), https://science.sciencemag.org/content/313/5786/504.full.pdf .
- Salakhutdinov and Hinton (2009) R. Salakhutdinov and G. Hinton, International Journal of Approximate Reasoning 50, 969 (2009), special Section on Graphical Models and Information Retrieval.
- Cho (2013) K. Cho, in ICML (3) (2013) pp. 432–440.
- Sakurada and Yairi (2014) M. Sakurada and T. Yairi, in Proceedings of the MLSDA 2014 2Nd Workshop on Machine Learning for Sensory Data Analysis, MLSDA’14 (ACM, New York, NY, USA, 2014) pp. 4:4–4:11.
- Huang et al. (2018) H. Huang, z. li, R. He, Z. Sun, and T. Tan, in Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Curran Associates, Inc., 2018) pp. 52–63.
- Kusne et al. (2014) A. G. Kusne, T. Gao, A. Mehta, L. Ke, M. C. Nguyen, K.-M. Ho, V. Antropov, C.-Z. Wang, M. J. Kramer, C. Long, and I. Takeuchi, Scientific Reports 4, 6367 (2014), article.
- Suram et al. (2017) S. K. Suram, Y. Xue, J. Bai, R. Le Bras, B. Rappazzo, R. Bernstein, J. Bjorck, L. Zhou, R. B. van Dover, C. P. Gomes, and J. M. Gregoire, ACS Combinatorial Science 19, 37 (2017), pMID: 28064478, https://doi.org/10.1021/acscombsci.6b00153 .
- Iwasaki et al. (2017) Y. Iwasaki, A. G. Kusne, and I. Takeuchi, npj Computational Materials 3, 4 (2017).
- Li et al. (2018) S. Li, Z. Xiong, and J. Hu, Materials Science and Technology 34, 315 (2018), https://doi.org/10.1080/02670836.2017.1389116 .
- Stanev et al. (2018) V. Stanev, V. V. Vesselinov, A. G. Kusne, G. Antoszewski, I. Takeuchi, and B. S. Alexandrov, npj Computational Materials 4, 43 (2018).
- Xing et al. (2018) H. Xing, B. Zhao, Y. Wang, X. Zhang, Y. Ren, N. Yan, T. Gao, J. Li, L. Zhang, and H. Wang, ACS Combinatorial Science 20, 127 (2018), pMID: 29381327, https://doi.org/10.1021/acscombsci.7b00171 .
- Utimula et al. (2018) K. Utimula, R. Hunkao, M. Yano, H. Kimoto, K. Hongo, S. Kawaguchi, S. Suwanna, and R. Maezono, “Machine learning clustering technique applied to powder x-ray diffraction patterns to distinguish alloy substitutions,” (2018), arXiv:1810.03972 .
- Hongo et al. (2018) K. Hongo, S. Kurata, A. Jomphoak, M. Inada, K. Hayashi, and R. Maezono, Inorganic Chemistry 57, 5413 (2018), pMID: 29658713, https://doi.org/10.1021/acs.inorgchem.8b00381 .
- Dimiduk et al. (2018) D. M. Dimiduk, E. A. Holm, and S. R. Niezgoda, Integrating Materials and Manufacturing Innovation 7, 157 (2018).
- Nash et al. (2018) W. Nash, T. Drummond, and N. Birbilis, npj Materials Degradation 2, 1 (2018).
- Rietveld (1969) H. Rietveld, Journal of applied Crystallography 2, 65 (1969).
- Vegard (1921) L. Vegard, Zeitschrift für Physik 5, 17 (1921).
- Denton and Ashcroft (1991) A. R. Denton and N. W. Ashcroft, Phys. Rev. A 43, 3161 (1991).
- Jauch and Palmer (1993) W. Jauch and A. Palmer, Acta Crystallographica Section A 49, 590 (1993).
- Wei (1985) W. Wei, Journal of Applied Crystallography 18, 442 (1985), https://onlinelibrary.wiley.com/doi/pdf/10.1107/S0021889885010688 .
- Gull et al. (1987) S. F. Gull, A. K. Livesey, and D. S. Sivia, Acta Crystallographica Section A 43, 112 (1987).
- Sakata and Sato (1990) M. Sakata and M. Sato, Acta Crystallographica Section A 46, 263 (1990).
- Paszke et al. (2019) A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., in Advances in Neural Information Processing Systems (2019) pp. 8024–8035.
- Glorot et al. (2011) X. Glorot, A. Bordes, and Y. Bengio, in Proceedings of the fourteenth international conference on artificial intelligence and statistics (2011) pp. 315–323.
- Kingma and Ba (2014) D. P. Kingma and J. Ba, arXiv preprint arXiv:1412.6980 (2014).
- MacQueen (1967) J. MacQueen, in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1 (Oakland, CA, USA, 1967) pp. 281–297.
- Utimula et al. (2019) K. Utimula, K. Nakano, G. I. Prayogo, K. Hongo, and R. Maezono, arXiv preprint arXiv:1911.08071 (2019).