Data Driven Discovery of New Materials

Data Driven Discovery of New Materials

Isao TANAKA*
1 Dept. Mater. Sci. Engrg, Kyoto University, Kyoto, 606-8501, Japan

2ESISM, Kyoto University, Kyoto 606-8501, Japan 3NRL, JFCC, Nagoya 456-8587, Japan

ABSTRACT: Recently, challenges for accelerated discovery of materials with the aid of data centric science have been well demonstrated. One of the approaches uses materials database that is generated by first principles density functional theory (DFT) calculations. Thanks to recent progress of computational power and technique, a large number of DFT calculations can be made with the accuracy comparable to experiments, which can be used for high throughput screening of materials. Another approach uses machine-learning technique for making a model to estimate the target property. The whole materials library can then be screened. Verification process is generally required to examine the predictive power of the model. Models and the quality of the screening can be improved iteratively through Bayesian optimization process. The approach is useful when screening based upon the DFT data is not practical, i.e. when the computational cost for the descriptors is too high to cover the whole library within the practical time frame. This is the same if one needs to explore too large space to cover exhaustively. As an example of the use of a machine learning model to screen a library, I will talk about the story on the discovery of new low lattice thermal conductivity (LTC) crystals. We have established our own LTC dataset computed by the first principles anharmonic force constant method. Using approximately 100 theoretical LTC data, we made a machine learning model for LTC. Then all compounds registered in the inorganic crystal structure database (ICSD) library were ranked with respect to the predicted LTC. Finally, the candidates of the low LTC compounds were validated by first principles LTC calculations. A variety of compounds showing ultra-low LTC of approximately 0.1 Wm/K at 300 K were thereby discovered.

In the second part of my talk, I will explain how the technique called recommender system is useful for materials discovery. Recommender systems have been developed in the machine-learning community, and they became increasingly popular in a variety of scientific and nonscientific areas. In the field of E-commerce, such as in amazon.com and netflix.com, recommender systems suggest items to a user. Such a recommender system predicts the rating or preference for an item from an existing dataset comprised of the users’ history such as items purchased and numerical ratings given to items to generate a recommendation for the user. The present authors were motivated to use the matrix- and tensor-based recommender systems to discover currently unknown inorganic compounds or chemically relevant compositions (CRCs) systematically based on existing database. We have employed inorganic crystal structure database (ICSD) as training data which collects literature data obtained mostly by experiments. Each element in our chemical space is expressed by a chemical composition up to quinary systems that is composed of 66 cations and 10 anions. Only a simple composition was chosen. For example, max(a, b, c, d, x) = 20 for quinary AaBbCcDdXx composition. Nevertheless, the number of elements in the search space was very large, i.e., N = 2.3 ×1010 for the quinary compositions. On the other hand, the number of training data registered in ICSD is very small. It is only 1,321 for the quinary. In order to examine the success rate of the recommender system, we have employed other two databases of known inorganic compounds, namely ICDD-PDF and Springer Materials. Since there are significant overlaps among registered compounds in three databases, we used compounds registered only in ICDD-PDF or Springer Materials but not in ICSD as the test data. Information of chemical composition in the training data, i.e., ICSD data, were put into a rating tensor. For ternary compositions AaBbXx, cation type A, cation type B, anion type X and integer set {a,b,x} were selected as modes for the forth order rating tensor. It is a 66×66×10×198 tensor in the present study. When AaBbXx, composition was registered in ICSD, the value of the corresponding rating tensor element was set to be unity. Otherwise the value was zero. The size of the rating tensor was then reduced, i.e., tensor factorization was made, adopting a Tucker decomposition technique. The performance of the tensor factorization to discover currently unknown CRCs were evaluated by counting the number of correct answers that are registered in the test database. For ternary, quaternary and quinary compositions,59, 52 and 15 of the top 100 recommended compositions were found in the test database, respectively. The success rate was, therefore, 59%, 52% and 15% for top 100 ranked compositions. Although the success rate decreased with the ranking, it was still quite high even for the top 3,000 compositions. Considering that only a tiny portion of tensor elements was used as a training dataset, the high success rate is noteworthy. It should also be emphasized the high success rate for this recommender system was achieved with neither DFT database nor other prior physical/chemical knowledge.

We also used the tensor-based recommender system to search successful processing conditions for new compounds based on parallel experiments. Initially, an experimental database was constructed for 67 pseudobinary oxides registered in the ICSD by parallel experiments using 23 starting materials and 23 cation mixing ratios. Precursor powders were obtained by four synthesis methods (solid-state reaction, polymerized complex, cyclic ether sol-gel, and spray co-precipitation), which were fired at five different temperatures. This resulted in 1,648 unique chemical synthesis conditions and database entries. The reactants were characterized sequentially using powder X-ray diffraction equipment with an automatic sample exchanger. The synthesis results were rated as a score, which was placed into a fifth-order tensor with 243,340 elements. The Tucker decomposition was used to predicted yet-to-be-rated scores for unexperimented processing conditions. Good predictive performance of the present model was demonstrated by cross validation. It was further evaluated by examining the presence of highly rated compositions in another database, ICDD-PDF. Successful processing conditions for unexperimented compositions were found to be well recommended. 

Brief Introduction of Speaker
Isao Tanaka

Isao Tanaka is a professor in the department of materials science and engineering, Kyoto University, Japan. He is also the director of Element Strategy Initiative of Structure Materials (ESISM), a guest researcher of Japan Fine Ceramics Center (JFCC), and a member of the international advisory board of a European project, FAIR-DI. Trained as a metal physicist, he received his B.E. and M.E. from Kyoto University and his Ph.D. from Osaka University. In 1987, he joined ISIR, Osaka University, as an assistant professor where he studied processing and characterization of high purity silicon nitride. He got an Alexander von Humboldt Fellowship in 1992 and stayed a year in Manfred Rühle’s research group of
Max Planck Institute for Metals Research in Stuttgart, Germany. He returned to Kyoto University in 1993 and started to use quantum mechanical calculations and combined them with experimental techniques such as ELNES and XANES to study fundamental issues in a wide range of ceramic materials. He also studied electronic processes of defects, impurities, grain boundaries, surfaces, and their roles in macroscopic properties. In 2000s he made pioneering studies on thermo-physical properties of materials through first principles phonon calculations. Recently he is actively working on data-centric or informatics approach for discovery of new materials. Materials of his current interests are quite diverse including solid-state ionics, oxide/nitride semiconductors, structure ceramics/metals, etc. He is author or coauthor of 410 papers. He received a number of awards including Philipp Franz von Siebold Preis from President of Federal Republic of Germany.