6-15. The Construction of the High-Throughput Materials Simulation Environment

6-15. The Construction of the High-Throughput Materials Simulation Environment

Xiangfei Meng*, Geng Li, Xiaoqian Zhu, Xiaodong Jian

National Supercomputer Center in Tianjin, Tianjin 300457, China

Abstract: A high-throughput computing (HTC) infrastructure is the key to accelerate materials discovery in the materials genome engineering.Several efforts has been devoted to build the HTC environment by computer and material researchers. For instance, in America, the Materials Project is implemented on the National Energy Research Supercomputing Center (NERSC), and a joint collaboration between the Cray Supercomputer Company and Duke University develop the Automatic-FLOW for Materials Discovery (Aflow). In Europe, the largest repository for computational materials science worldwide, the Novel Materials Discovery (NOMAD) Repository, is built based on the Barcelona Supercomputing Center. National Supercomputer Center in Tianjin, the most famous supercomputer in China, constructed a HTC infrastructure for materials with the features of automatic workflow, high concurrency and multi-scale calculation, named as the computational platform of China materials Genome Engineering (CNMGE).

In order to reach the materials HTC requirements, we designed the integrated environment of supercomputing, cloud computing and big-data management, and developed a material HTC system and data management system. The core of materials HTC system is to build several automatic calculation workflows based on material functional properties. According to the material scale, we divided the material calculations into four parts: microscopic calculation, mesoscopic calculation, macroscopic calculation and cross-scale calculation. Each part can also be divided into several computing function workflows. Each computing function workflow can be realized by several steps: generating input files, configuring input files, using HTC to generate raw data, using analysis programs to generate the organized data, filtering data, storing the material results into database automatically, and querying the data through the API interface for users. To now, several basic materials calculation workflows are implemented in our platform.

Based on the database engines (MySQL and MongoDB), the materials data management system is developed. The core functions include computation controlling, results analysis, data dissemination and data validation. In addition, it can interact with the web service to upload and download files, display graphical results, and query data. The system contains three databases: software information library, atom potential library and material property database.

Currently, we have finished the whole framework and prototype platform with several practical functions. It is open for users to test and operate. We expect to make a progress in developing workflows for more material properties, the multi-scale calculation, and material data sharing service by the free and incentive mechanism.

Keywords: high-throughput calculation; Tianhe Serial Supercomputers; automatic workflow; materials data management system




摘要:高通量材料计算平台是高效材料研发与高性能计算的有机结合,是材料基因工程中的重要基础环节,是国际新材料研究领域的共性前沿创新载体,已经有国际团队开展高通量材料计算平台的研发和探索。这些平台突出的特点是依托超级计算系统结合当前高效材料研发的需求进行研发构建,例如NERSC的Materials Project,Cray构建的Automatic-Flow,以及依托Barcelona Supercomputing Centre构建的NOMAD等。我们依托国家超级计算中心和我国自主研发的千万亿次(P级)和正在研发中的百亿亿次(E级)超级计算平台,构建可实现高通量材料计算需要的自动流程、高并发、多尺度等突出特点和能力的“中国材料高通量计算平台(CNMGE)”。





Brief Introduction of Speaker


电话:022-65375551;Email: mengxf@nscc-tj.cn