Exploring the application of natural language processing in materials

Xue Jiang , Weiren Wang, Shaohan Tian, Turab Lookman , Yanjing Su, Jianxin Xie
Beijing Advanced血ovation Center for Materials Genome Engineering, University of Science and
Technology Beijing, Beijing, 100083, China 

EXTENDED ABSTRACT: The explosive growth of scientific literature has accumulated a vast amount of valuable knowledge and data. Natural language processing(NLP) technology serves as the key to unlocking this treasure trove, providing a continuous stream of momentum for data-driven new material development. In light of this, we have developed an automated material literature data extraction pipeline, encompassing literature acquisition, preprocessing, table parsing, text classification, named entity recognition, relation extraction, and dependency resolution. This pipeline enables the automatic extraction of alloy components, processing routes, and properties, creating a large-scale, high-quality dataset suitable for machine learning. Specifically, considering the complex and diverse nature of alloy processing route, we propose a semi-supervised strategy-based method for generating material processing dictionaries to accurately identify action sequences within material processing texts. In recent years, large models such as BERT and GPT have been revolutionizing natural language processing technology. By pre-training on massive unlabeled corpora and fine-tuning for specific tasks, these models exhibit a stronger understanding at the natural language level and greater focus on specific tasks. Leveraging approximately 4 million abstracts of materials science literature and 90,000 full-text documents on steel, we successfully trained a pre-trained language model, SteelBERT, tailored to the steel field. It excels in tasks such as text classification, chemical element and processing action encoding, offering insights for large models to address specific material design problems.
Keywords: Text mining; Machine learning; Alloy synthesis and processing
REFERENCES
[l] W. Wang, X. Jiang, S.Tian, P. Liu, T. Lookman, Y. Su, J. Xie. Npj Computational Materials, 2023, 9: 183.
[2] W. Wang, X. Jiang, S.Tian, P. Liu, T. Lookman, Y. Su, J. Xie. Npj Computational Materials, 2022, 8: 9.

Brief Introduction of Speaker
Xue Jiang

Xue Jiang, Ph.D., Associate Professor. In 2020, she obtained a Ph.D. degree in Materials Science and Engineering from University of Science and Technology Beijing. She has been engaged in machine learning/text mining -driven material design and materials database. She has published more than 30 articles in journals such as npj Comput. Mater., Scripta Mater., npj Mater. Degrad., ACS Appl. Mater. Interfaces etc. and obtained 8 patents.