Automated Pipeline For Superalloy
Property Extraction by Text Mining from Materials Science Literature
Xue Jiang 1, *, Weiren Wang 1, +, Depeng Dang 2, Yanjing Su 1,
*, Jianxin Xie 1
1 University of Science and
Technology Beijing, Beijing, 100083, China
2 Beijing Normal University,
Beijing, 100875, China
ABSTRACT: Artificial intelligence
is accelerating the discovery of new materials, where the prerequisite for the
success is massive structural data. In this paper, we propose a new natural
language processing (NLP) pipeline to automatically mine structural data on
superalloys from the literature, in that a practical rule-based named entity
recognition (NER) method and an effective heuristic multiple-relation
extraction algorithm are developed to overcome the obstacle for small corpora
during the text mining. Within 30
minutes, 1258 records were extracted from 8917 superalloy articles, covering γ′
solvus temperature, density, solidus temperature, and liquidus temperature. The
F1 score of NER for alloy named entity reach 92.07%, much higher than the
55.54% and 24.86% achieved using the bidirectional long short-term memory
(BiLSTM) network with a conditional random field (CRF) layer (BiLSTM-CRF) model
and ChemDataExtractor tool, respectively. The F1 score of relation extraction
for γ' solvus temperature was 79.37%, better than the 36.84% obtained by the
well-known “Snowball” semi-supervised algorithm. It is the first report of a
text mining pipeline for superalloys used to automatically generate property
databases in a practical and effective manner. We also provide a web-based
toolkit as an online open-source platform (http://SuperalloyDigger.mgedata.cn),
forming a basis for further application of our text mining pipeline.
Keywords: Superalloy;
Text mining; Named entity recognition; Relation extraction; Database
Jiang Xue has completed her bachelor and master degree in computer science and technology from Beijing Normal University, and PhD in materials science and engineering from the University of Science and Technology Beijing. She is mainly engaged in material database and big data technology interdisciplinary research work, including materials database technology, machine learning aided material design, and text mining in material design applications and other materials science and IT interdisciplinary research. She has published more than 10 papers in Scripta Materialia, Calphad, Computational Materials Science and so on.