GPU加速电子响应函数计算研究外文翻译资料

 2022-12-29 01:12

本科毕业设计(论文)

外文翻译

GPU acceleration and performance of the particle-beam-dynamics code Elegant

    1. King,1 I.V. Pogorelov,1, lowast; K.M. Amyx,1, dagger; M. Borland,2 and R. Soliday2

1Tech-X Corporation, Boulder CO 80303, USA 2Argonne National Laboratory, Argonne, IL 60439, USA (Dated: November 22, 2018)

Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and design of a variety of high-energy particle accelerators and accelerator-based systems. In this pa- per we discuss a recently developed version of the code that can take advantage of CUDA-enabled graphics processing units (GPUs) to achieve significantly improved performance for a large class of simulations that are important in practice. The GPU version is largely defined by a framework that simplifies implementations of the fundamental kernel types that are used by Elegant: particle operations, reductions, particle loss, histograms, array convolutions and random number genera- tion. Accelerated performance on the Titan Cray XK-7 supercomputer is approximately 6-10 times better with the GPU than all the CPU cores associated with the same node count. In addition to performance, the maintainability of the GPU-accelerated version of the code was considered a key design objective. Accuracy with respect to the CPU implementation is also a core consideration. Four different methods are used to ensure that the accelerated code faithfully reproduces the CPU results.

PACS numbers: 07.05.Tp

Keywords: Particle-accelerator simulation, GPU acceleration

PROGRAM SUMMARY

Program Title: Kernels from the GPU-accelerated Elegant

Licensing provisions: MIT

Programming language: C/C /CUDA

Nature of problem: The original design of the Elegant accelerator physics code was implemented on central processing units with message-passing interface parallelization. This implementation is not able to use next-generation multicore systems.

Solution method: In this package we develop routines based on the CUDA language extensions to C that enable porting the Elegant code to be run on graphics processing units (GPUs). Special consideration is given to algorithms that require collective communication on the GPU.

Additional comments including Restrictions and Unusual features: The full Elegant source code is freely available from Argonne National Laboratory and these distributions include the GPU code in the later releases.

      1. INTRODUCTION

Elegant is an open-source, multi-platform code used for design, simulation, and optimization of a wide variety of high-energy particle accelerators and accelerator-based systems, including free-electron laser (FEL) driver linear accelerators (“linacs”), energy recovery linacs (ERLs), and storage rings [1–3]. The parallel version, Pelegant [4–6], uses MPI for parallelization and shares all source code with the serial version. In a number of settings that include accelerator design optimization, Elegant is used as the tracking component of fully scripted simulations. Elegant is fundamentally a lumped-element particle accelerator tracking code utilizing 6D phase space, and is written mostly in C. A variety of numerical techniques are used for particle propagation, including transport matrices (up to third order), symplectic integration, and adaptive numerical integration. Collective effects are also available, including space charge, coherent synchrotron radiation (CSR), wakefields, and resonant impedances.

In recent years, general purpose computing on graphics processing units (GPUs) has attracted significant interest from the scientific computing community because these devices offer unmatched performance at low

lowast;Presently at RadiaSoft LLC

dagger;Presently at Sierra Nevada Corporation

cost and at high performance per watt. Unlike general purpose processors, which devote significant on- chip resources to command and control, pre-fetching, caching, instruction-level parallelism, and instruction cache parallelism, GPUs devote a much larger amount of silicon to maximizing memory bandwidth and raw floating-point computation power. This comes at the expense of shifting the burden towards developers and away from on-chip command and control logic, and additionally requires relatively large problems with high levels of parallelism.

One of the challenges of accelerating a code such as Elegant is the shear number and variety of kernels required to accelerate common use cases. Without reasonable accelerated coverage of the code the benefits of using the GPU may be severely reduced. This reduction occurs both from the time required to transfer the particles between the device and host memory when entering a stage of a simulation that cannot be performed on the GPU, as well as due to the fundamental limit in the form of Amdahlrsquo;s argument [7]. Amdahlrsquo;s argument states that if a runtime fraction, F , of a code is accelerated (threaded) with n concurrent threads, then the maximum speedup is (F (1 F )/n)minus;1. Thus, the speedup from an accelerated portion of a code that covers a runtime fraction of 50% with infinite threads is only a factor of two, a rather modest acceleration. It has been our intent to accelerate a

剩余内容已隐藏,支付完成后下载完整资料


英语原文共 14 页,剩余内容已隐藏,支付完成后下载完整资料


资料编号:[273136],资料为PDF文档或Word文档,PDF文档可免费转换为Word

您需要先支付 30元 才能查看全部内容!立即支付

课题毕业论文、文献综述、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。