EXTENDED ABSTRACT: Combining machine learning with CALPHAD is a promising method for the design of materials [1]. However, the small data dilemma makes many cutting-edge data-based methods less effective [2]. Moreover, traditional machine learning methods, which are suitable for small datasets, still face three major issues: (a) The clarity of algorithm selection criteria is lacking; (b) Manual parameter adjustments introduce inherent human biases [3]; (c) A singular metric proves inadequate in comprehensively evaluating diverse models. In order to achieve high prediction accuracy using a small amount of data, this paper presents an Auto-APE framework that amalgamates diverse regression algorithms, automated tuning methodologies, and exhaustive evaluation metrics to proffer recommendations for the most optimal model. Based on this framework, the leaveone-out elimination and addition methods are integrated for data screening. Utilizing symbolic regression for additional feature generation, enhancing the correlation between features and attributes to construct an improved posterior model. Finally, this workffow is applied to the hardness prediction of 273 AlCo-Cr-Cu-Fe-Ni high entropy alloys. The 10-fold CV RSME of the best model is reduced by 32% after data screening and an additional 7% after features addition, demonstrating the effectiveness of data reffnement and the potential for posterior model components to substitute for prior knowledge. This Auto-APE framework holds the potential to provide unbiased modeling and evaluating strategy to accelerate the application of machine learning in material design.
Keywords:Data augmentation; Feature engineering; Symbolic regression; High-entropy alloys;
REFERENCES:
[1] Liu, Z. K, Acta Materialia, 200 (2020): 745-792.
[2] Xu, P., et al., npj Computational Materials, 42 (2023): 9.
[3] Batra, R., et al., Nature Chemistry, 14(2022): 1427–1435.
Yu Zhigang is an associate researcher at Shanghai University. Selected by Shanghai "Oriental Talents Program" Project (2023), Doctor of Metallurgical Physical chemistry/postdoctoral fellow of Materials Science and Engineering, under the guidance of Academician Zhou Guozhi. His main research direction is multi-scale computational method and machine learning design of advanced structural materials, and he has a solid theoretical foundation in the field of material design. In the last ffve years in the Journal of Materials Science & Technology, Journal of Chemical Theory and Computation, Material & Design and other journals published 18 SCI papers; Applied for 7 invention patents; He presided over 4 projects of the National Natural Science Foundation.