Large-scale Materials Generation Model on Supercomputer: Methods, Principles, and Applications

Pin Chen, Hui Yan, Qing Mo, Zexin Xu, Yutong Lu
National Supercomputer Center in Guangzhou, School of Computer Science and Engineering,
Sun Yat-sen University, Guangzhou, 510006, China; 

EXTENDED ABSTRACT: AlphaFold2 has been released for nearly 3 years, attracting widespread attention in the fields of biology and drug design. It is considered a significant breakthrough in the field of protein structure prediction (CSP). Similarly, in the field of materials, the direct prediction of crystal structures solely based on chemical composition has been a long-standing goal in computational materials research. Related computational software such as CALYPSO and USPEX have been developed. However, there have been few reports on directly and rapidly predicting crystal structures using artificial intelligence (AI) methods. In this report, we draw inspiration from the extensively validated diffusion model approach in the field of computer vision and apply it to the problem of CSP. We propose an AI model that apply equivariant diffusion-based methods to address CSP with joint diffusion on lattices and fractional coordinates and validate it on multiple test datasets. To the best of our knowledge, this model represents the first report of a general CSP model based on diffusion simulation methods. Furthermore, to enhance the model's generality and generalization capability, we collected nearly 2 million experimentally or computationally verified stable structures by integrating existing crystal structure databases such as ICSD, COD, MP, and OQMD. These data were used to train our self-developed AI model. With the GPU resources of the "Tianhe-2" supercomputer, we successfully developed a large-scale model for CSP. To validate the effectiveness of the model, we successfully designed a series of potential superconducting materials with higher superconducting transition temperatures (Tc) based on limited superconducting material data. Additionally, using our self-developed generative model, we constructed a hypothetical crystal structure database with over 10 million entries for the design and exploration of new materials. The relevant data, deep learning models and software codes in this report are all available at Matgen platform(https:/ /matgen.nscc-gz.cn).
Keywords: Crystal structure prediction; diffusion model; big database 

Brief Introduction of Speaker
Pin Chen

Pin Chen, Ph.D. (Sen Yat-sen University), is an engineer at the National Supercomputing center in Guangzhou. Currently, he mainly engaged in developing software, database, platform, AI and applications in the field of Materials Genome Engineering. He has published more than 10 papers as the first (including co-first) or corresponding author in academic journals such as Cell Metab., Sci. Adv., Angew. Chem. Int. Ed., Adv. Funct. Mater. He is the core designer and developer of Matgen platform. His main research interests focus on artificial intelligence and high performance computing.