1-18. Accelerating First Principle Calculation by CPU/GPU Heterogeneous Parallelization

1-18. Accelerating First Principle Calculation by CPU/GPU Heterogeneous Parallelization

Ji Qi, Minghui Yang*

Wuhan Magnetic Resonance Center,State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences

Abstract: First principle calculation is an important method in materials genome research. The high throughput computing demands the integration of efficient scientific software and advanced computer technology. Nowadays, the architecture with main processor and co-processor has been being widely used in supercomputers. Amongst the various heterogeneous architecture, CPU+GPU heterogeneous framework is the most typical and most widely used.

       Due to the large-scale concurrent computing demand in material genome research, we have been developing a first principle quantum mechanics software – WESP (Wuhan Electronic Structure Package, WESP). Specifically, WESP performs HF(Hartree-Fock) and DFT (Density Functional Theory) calculations with the Gaussian type basis function and accelerate Electron Repulsion Integrals (ERIs) with CPU/GPU heterogeneous parallel computations. The main features of WESP include:

  1. Classify the types of ERIs according to the angular momentum: ERIs with low angular momentum have simple formation but with large amount, are suitable for GPU equipped with many Arithmetic Logical Units (ALUs); on the contrary, high angular momentum ERIs have complex formation but are of less amount, that can be computed in CPU with large caches more efficiently. To take the advantage of GPU, the GPU code is developed for ERIs with low anglular momentum.

  2. To make full use of computing resource of both CPUs and GPUs, an OpenMP dynamic heterogeneous parallel architecture is developed: different ERIs are sorted by the angular momentum in working pool, the CPU grabs ERI tasks from high to low angular momentum, while GPU in the opposite direction. This scheme takes into account the respective advantages of CPU and GPU, and is compatible with other heterogeneous computers.

         Currently the CPU/GPU heterogeneous parallel framework for HF calculation has been implemented in WESP, and it is compatible with most CPU/GPU heterogeneous computers regardless of the amount of the CPU cores and GPUs. Then calculations for a series of benchmark systems (24 – 168 atoms with 260- 4080 basis functions) indicate that the efficiencies in WESP taking heterogeneous parallel accelerated HF calculation is better than the CPU-only quantum chemistry software Gaussian 09 and the GPU-only software Terachem 1.91, as shown in Fig. 1. In the future work, we will focus on: (1) the GPU algorithm of exchange-correlation integrals in DFT calculation; (2) the development of mixed precision algorithm for GPU. We hope the WESP software will be an efficient research tool for material and chemistry research.

KeywordsFirst Principle Calculation; CPU; GPU; Heterogeneous Parallelization; Electronic Repulsion Integral

    

Figure 1: Gaussian 09, TeraChem 1.91and WESP performed (a)HF/cc-pVDZ (up to d orbitals) calculations; (b) HF/cc-pVTZ(up to f orbitals) calculations, the speedups compared to Gaussian 09。Gaussian 09 uses 12 CPU cores (double precision,integral threshold is chosen as 10e-10); TeraChem 1.91used 1 GPU (double precision);WESP used 1 GPU(1g), 2 GPUs(2g), 12 CPU core together with 1 GPU (double precision) (12c1g),12 CPU cores together with 2 GPUs (double precision) (12c2g), respectively. CPU: Intel Xeon E5-2630 v2, GPU: nVIDIA Tesla K20.
图 1: 分别使用 Gaussian 09、TeraChem 1.91 以及 WESP 进行(a)HF/cc-pVDZ 计算(最高到 d 轨道); (b)HF/cc-pVTZ(最高到 f 轨道)相对于 Gaussian 09 的加速。G09 使用 12CPU 核心计算(双精度,积分阈值取 10e-10); TeraChem 1.91 采用 1 颗 GPU 计算(双精度);WESP 分别用 1 颗 GPU(1g),2 颗 GPU(2g),12 核心 CPU+1 颗 GPU(双 精度)(12c1g),12 核心 CPU+2 颗 GPU 计算(双精度)(12c2g)。其中 CPU: Intel Xeon E5-2630 v2,GPU: nVIDIA Tesla K20。


CPU/GPU 异构并行加速第一性原理计算

齐记,杨明晖*

        中国科学院,精密测量科学与技术创新研究院,武汉物理与数学研究所,
            波谱与原子分子物理国家重点实验室,武汉磁共振中心

摘要:第一性原理计算是材料基因组研究的主要方法之一,高通量计算需要将高效的计算软件 与先进的计算机技术紧密结合。目前超级计算机普遍采用主处理器和协处理器异构协同计算, 其中最为典型、应用最为广泛的是 CPU/GPU 异构模式。

     针对材料基因组研究中第一性原理计算的大规模并发计算需求,我们发展了第一性原理量 子力学计算软件 WESP(Wuhan Electronic Structure Package)。该软件采用高斯基函数进行 HF(Hartree-Fock)和 DFT(Density Functional Theory),通过 CPU/GPU 异构并行加速其中的耗时热 点-双电子积分,以达到充分利用单个计算节点内的所有计算资源。WESP 主要特点包括:

  1. 对双电子积分按角动量分类:低角动量双电子积分的算法相对简单但数量较多,适合在具有 众多处理单元的 GPU 上计算;高角动量双电子积分的算法复杂且数量也显著减少,适合在 具有较大缓存的 CPU 上计算。根据这些特点,我们发开了低角动量双电子积分的 GPU 代码 并改进了原有的 CPU 代码。

  2. 为充分发挥CPU和GPU计算资源,我们开发了双电子积分的动态异构并行框架:不同类型 的双电子积分按照计算复杂度在任务池中排序,CPU 从任务池中按计算复杂度从高到低的次序抓取任务,GPU 则与 CPU 正好相反,CPU 和 GPU 分别优先计算高、低角动量双电子 积分。这种并行方案充分考虑了双电子积分不同类型的计算特点以及 CPU 和 GPU 的各自优 势。这种动态异构并行框架可应用于其它异构计算机。

       目前 WESP 已经完成 HF 计算的 CPU/GPU 异构并行部分,实现了多 CPU 计算核心和多 GPU 卡的异构并行。通过对一系列模型体系的测试(原子数 24-168,基函数 260-4080)表明,WESP 采 用异构并行加速 HF 计算的效率优于 CPU 量子化学软件 Gaussian 09 和 GPU 量子化学软件 Terachem,如图 1 所示。在后续工作中,我们将继续进行:(1)DFT 交换-相关积分的 GPU 算法 开发;(2)混合精度的 GPU 算法开发。希望 WESP 软件能够为材料、化学等领域的研究提供一 个高效的研究工具。

关键词:第一性原理计算;CPU;GPU;异构并行;双电子积分 

Brief Introduction of Speaker
杨明晖

1987 年-1997 年在四川大学化学学院学习,获理学博士学位;1997 年-1997 年南京大学化学学院博士后;1999 年-2004 年新加坡国立大学计 算科学系博士后;2004 年加入中国科学院武汉物理与数学研究所,任研 究员,获“百人计划”择优支持。主要研究领域为化学反应动力学、分子 光谱和科学计算软件研发。
Email: yangmh@wipm.ac.cn