本科毕业设计(论文)
外文翻译
GPU acceleration and performance of the particle-beam-dynamics code Elegant
1Tech-X Corporation, Boulder CO 80303, USA 2Argonne National Laboratory, Argonne, IL 60439, USA (Dated: November 22, 2018)
Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and design of a variety of high-energy particle accelerators and accelerator-based systems. In this pa- per we discuss a recently developed version of the code that can take advantage of CUDA-enabled graphics processing units (GPUs) to achieve significantly improved performance for a large class of simulations that are important in practice. The GPU version is largely defined by a framework that simplifies implementations of the fundamental kernel types that are used by Elegant: particle operations, reductions, particle loss, histograms, array convolutions and random number genera- tion. Accelerated performance on the Titan Cray XK-7 supercomputer is approximately 6-10 times better with the GPU than all the CPU cores associated with the same node count. In addition to performance, the maintainability of the GPU-accelerated version of the code was considered a key design objective. Accuracy with respect to the CPU implementation is also a core consideration. Four different methods are used to ensure that the accelerated code faithfully reproduces the CPU results.
PACS numbers: 07.05.Tp
Keywords: Particle-accelerator simulation, GPU acceleration
PROGRAM SUMMARY
Program Title: Kernels from the GPU-accelerated Elegant
Licensing provisions: MIT
Programming language: C/C /CUDA
Nature of problem: The original design of the Elegant accelerator physics code was implemented on central processing units with message-passing interface parallelization. This implementation is not able to use next-generation multicore systems.
Solution method: In this package we develop routines based on the CUDA language extensions to C that enable porting the Elegant code to be run on graphics processing units (GPUs). Special consideration is given to algorithms that require collective communication on the GPU.
Additional comments including Restrictions and Unusual features: The full Elegant source code is freely available from Argonne National Laboratory and these distributions include the GPU code in the later releases.
-
-
- INTRODUCTION
-
Elegant is an open-source, multi-platform code used for design, simulation, and optimization of a wide variety of high-energy particle accelerators and accelerator-based systems, including free-electron laser (FEL) driver linear accelerators (“linacs”), energy recovery linacs (ERLs), and storage rings [1–3]. The parallel version, Pelegant [4–6], uses MPI for parallelization and shares all source code with the serial version. In a number of settings that include accelerator design optimization, Elegant is used as the tracking component of fully scripted simulations. Elegant is fundamentally a lumped-element particle accelerator tracking code utilizing 6D phase space, and is written mostly in C. A variety of numerical techniques are used for particle propagation, including transport matrices (up to third order), symplectic integration, and adaptive numerical integration. Collective effects are also available, including space charge, coherent synchrotron radiation (CSR), wakefields, and resonant impedances.
In recent years, general purpose computing on graphics processing units (GPUs) has attracted significant interest from the scientific computing community because these devices offer unmatched performance at low
lowast;Presently at RadiaSoft LLC
dagger;Presently at Sierra Nevada Corporation
cost and at high performance per watt. Unlike general purpose processors, which devote significant on- chip resources to command and control, pre-fetching, caching, instruction-level parallelism, and instruction cache parallelism, GPUs devote a much larger amount of silicon to maximizing memory bandwidth and raw floating-point computation power. This comes at the expense of shifting the burden towards developers and away from on-chip command and control logic, and additionally requires relatively large problems with high levels of parallelism.
One of the challenges of accelerating a code such as Elegant is the shear number and variety of kernels required to accelerate common use cases. Without reasonable accelerated coverage of the code the benefits of using the GPU may be severely reduced. This reduction occurs both from the time required to transfer the particles between the device and host memory when entering a stage of a simulation that cannot be performed on the GPU, as well as due to the fundamental limit in the form of Amdahlrsquo;s argument [7]. Amdahlrsquo;s argument states that if a runtime fraction, F , of a code is accelerated (threaded) with n concurrent threads, then the maximum speedup is (F (1 F )/n)minus;1. Thus, the speedup from an accelerated portion of a code that covers a runtime fraction of 50% with infinite threads is only a factor of two, a rather modest acceleration. It has been our intent to accelerate a
剩余内容已隐藏,支付完成后下载完整资料
英语原文共 14 页,剩余内容已隐藏,支付完成后下载完整资料
资料编号:[273136],资料为PDF文档或Word文档,PDF文档可免费转换为Word
课题毕业论文、文献综述、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。
您可能感兴趣的文章
- 酞菁钴诱导的二维黑色鳞光体铁磁性外文翻译资料
- 用于光学非线性测量的带相位物体的时间分辨泵浦探测技术外文翻译资料
- 专业知识和任务指导专一性是准教师专业视野的影响因素外文翻译资料
- 物理学中的探索太阳能热水器物理概念教学的建议外文翻译资料
- 与传统教学相比,翻转教学对工科学生在物理实验室表现的有效性的数据集外文翻译资料
- 什么是伟大的教学?(P8-P17)外文翻译资料
- 早期外语学习对初中至高中语言能力发展的影响外文翻译资料
- 单负和双正材料组合结构中电磁波隧穿现象的研究外文翻译资料
- 对美国宾夕法尼亚Marcellus气井回流盐水的地球化学评价外文翻译资料
- 中国的探究式学习:教师们是否真的实践了他们所宣扬的,为什么?外文翻译资料