The 4th Forum of Materials Genome Engineering

S-4-07 Automated Pipeline For Superalloy Property Extraction by Text Mining from Materials Science Literature

Automated Pipeline For Superalloy Property Extraction by Text Mining from Materials Science Literature

Xue Jiang ^1,*, Weiren Wang ^{1, +}, Depeng Dang ², Yanjing Su ^1,
*, Jianxin Xie ¹

¹ University of Science and Technology Beijing, Beijing, 100083, China

² Beijing Normal University, Beijing, 100875, China

ABSTRACT: Artificial intelligence is accelerating the discovery of new materials, where the prerequisite for the success is massive structural data. In this paper, we propose a new natural language processing (NLP) pipeline to automatically mine structural data on superalloys from the literature, in that a practical rule-based named entity recognition (NER) method and an effective heuristic multiple-relation extraction algorithm are developed to overcome the obstacle for small corpora during the text mining. Within 30 minutes, 1258 records were extracted from 8917 superalloy articles, covering γ′ solvus temperature, density, solidus temperature, and liquidus temperature. The F1 score of NER for alloy named entity reach 92.07%, much higher than the 55.54% and 24.86% achieved using the bidirectional long short-term memory (BiLSTM) network with a conditional random field (CRF) layer (BiLSTM-CRF) model and ChemDataExtractor tool, respectively. The F1 score of relation extraction for γ' solvus temperature was 79.37%, better than the 36.84% obtained by the well-known “Snowball” semi-supervised algorithm. It is the first report of a text mining pipeline for superalloys used to automatically generate property databases in a practical and effective manner. We also provide a web-based toolkit as an online open-source platform (http://SuperalloyDigger.mgedata.cn), forming a basis for further application of our text mining pipeline.

Keywords: Superalloy; Text mining; Named entity recognition; Relation extraction; Database

Brief Introduction of Speaker

Jiang Xue

Jiang Xue has completed her bachelor and master degree in computer science and technology from Beijing Normal University, and PhD in materials science and engineering from the University of Science and Technology Beijing. She is mainly engaged in material database and big data technology interdisciplinary research work, including materials database technology, machine learning aided material design, and text mining in material design applications and other materials science and IT interdisciplinary research. She has published more than 10 papers in Scripta Materialia, Calphad, Computational Materials Science and so on.