EXTENDED ABSTRACT: Data-driven machine learning (ML) has showed excellent capabilities in materials property prediction and novel materials discovery, whose upper limitation of the performance of ML models is determined by the quality of the input data. Due to the characteristics of materials data such as multiple sources, small-size samples and high-dimensionality feature space, different structures, and strong uncertainty, it is difffcult to ensure its integrity, consistency, and accuracy, which can introduce undesirable bias for ML modeling. Focusing on the quality and quantity of material data, this report proposes a new method of data quality governance driven by both data and knowledge, based on the ML framework with domain knowledge embedding, to provide an effective strategy for solving the problem of materials data governance. Moreover, the work of the research group in data quality governance and its application embedded in domain knowledge will be introduced and prospected for mining creep structure-activity relationship of nickel-based single crystal superalloys and predicting the activation energy of solid electrolyte materials for energy storage battery.
Keywords:Data quality; Machine Learning; Domain Knowledge; Materials Property Prediction; Novel Materials Discovery
Siqi Shi obtained his B.S. and M.S. degrees from Jiangxi Normal University in 1998 and in 2001, respectively. He got his Ph.D. degree from Institute of Physics, Chinese Academy of Sciences in 2004. After that, he joined the National Institute of Advanced Industrial Science and Technology, Japan and Brown University, USA as a senior research associate, respectively. In early 2013, he joined Shanghai University as a professor. His current research interest focuses on the fundamentals and multiscale calculation of electrochemical energy storage materials and materials design and performance optimization using machine learning.