A Large Multi-Modality Model for Chemistry and Materials Science

EXTENDED ABSTRACT: Rapid developments of AI tools are expected to offer unprecedented assistance to the research of chemistry and materials science. However, neither existing task-speciffc models nor emerging general large language models (LLM) can cover the wide range of data modality and task categories. The specialized language and knowledge used in the ffeld including various forms of molecular presentations and spectroscopic methods, hinders the performance of generaldomain LLMs in the disciplines. We ffrst developed a 13B LLM trained on 34B tokens from chemical literature, textbooks, and instructions. The resulting model, ChemDFM1, can store, understand, and reason over chemical knowledge while still possessing generic language comprehension capabilities. In our quantitative evaluation, ChemDFM surpasses GPT-4 on most chemical tasks, despite the signiffcant size difference. In an extensive third-party test2, ChemDFM signiffcantly outperforms most of representative open-sourced LLMs. We further developed a multi-modal LLM for chemistry and materials science: ChemDFM-X. Diverse multimodal data includes SMILES, GNN, mass spectroscopy and IR spectroscopy, etc, generating a large domain-speciffc training corpora containing 7.6M data. ChemDFM-X is evaluated on extensive experiments of various cross-modality tasks. The results demonstrate the great potential of ChemDFM-X in inter-modal knowledge comprehension. This study illustrates the potential of LLM as a co-scientist in the general area of chemistry and materials science tasks. A few examples using ChemDFM-X to assist material research will be demonstrated.

Keywords: Multi-modality, Large Language Model, Spectroscopy, Materials Science

Brief Introduction of Speaker
Zihan Zhao

A brief introduction (150-300 words) of the speaker, including but not limited to, afffliation, contact information, education/research background, and recent research interest. The Bio information will be used in the abstract book and the workshop website.Dr. Xin Chen earned his B.E in Materials Science from University of Science and Technology of China and Ph.D. in Chemistry from Stanford University. He was a professor at Boston University and a senior researcher of Gusu Laboratory. Dr. Chen recently joined Suzhou Laboratory as a fellow scientist and am currently in charge of a National Science and Technology Major Project to build AI for Materials Science Platform in the national laboratory. His recent research interest focuses on combining generative AI and chemical/physical domain knowledge to assist materials design, including using spectroscopy embedded CNN for property prediction and developing multimodel langrage models specialized in chemistry and materials science as potential co-scientists.