Output list
Conference proceeding
FleCSI 2.0: The Flexible Computational Science Infrastructure Project
Published 01/01/2022
EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 13098, 480 - 495
The FleCSI 2.0 programming system supports multiphysics application development through a runtime abstraction layer, and by providing core topology types that can be customized for specific numerical methods. The abstraction layer provides a single-source programming interface for distributed and shared-memory data parallelism through task and kernel execution, and has been demonstrated to introduce virtually no runtime overhead. FleCSI's core topology types represent a rich set of basic data structures that can be specialized to create applicationfacing interfaces for a variety of different physics packages. Using the FleCSI control and data models, it is straightforward to compose multiple packages to create full multiphysics applications. When used with a task-based backend, FleCSI offers extended runtime analysis that can increase task concurrency, facilitate load balancing, and allow for portability across heterogeneous computing architectures.
Conference proceeding
FleCSPH: a Parallel and Distributed Smoothed Particle Hydrodynamics Framework Based on FleCSI
Published 01/01/2018
PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 484 - 491
FleCSPH(1) is a complement of the FleCSI framework, focusing on tree data structures with support for binary, quad and octrees. The framework provides parallel, distributed and accelerated tree construction and search in the context of multi-physics problems. FleCSI(2) is a compile-time configurable framework designed to support multi-physics applications and is developed and maintained by the Los Alamos National Laboratory. FleCSI provides domain scientists with a set of data structures and tools to target parallel and distributed architectures on current and future supercomputers, including the ongoing 2020 target to support the first Exascale supercomputers. Our work on FleCSPH is based on a specific method that emphasizes different walls in HPC called Smoothed Particle Hydrodynamics (SPH). This method can be efficiently solved using binary, quad and octrees while providing irregularities in terms of computation and communications. This paper is decomposed as follows: The introduction describes the SPH method and the reasons that makes it a good test case for the FleCSPH framework. We give more details on the FleCSI framework; The second part is dedicated to the tree data structure itself and the choices we made for the domain decomposition, the tree construction and search. We also describe our distribution strategies and their reliability to the FleCSI model; The third part describes our test cases and the current results of the application. The test cases are the Sod shock tube, the Sedov blast and 2D/3D fluid flows.
Conference proceeding
Taking Lessons Learned from a Proxy Application to a Full Application for SNAP and PARTISN
Published 01/01/2017
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 108, 555 - 565
SNAP is a proxy application which simulates the computational motion of a neutral particle transport code, PARTISN. In this work, we have adapted parts of SNAP separately; we have re-implemented the iterative shell of SNAP in the task-model runtime Legion, showing an improvement to the original schedule, and we have created multiple Kokkos implementations of the computational kernel of SNAP, displaying similar performance to the native Fortran. We then translate our Kokkos experiments in SNAP to PARTISN, necessitating engineering development, regression testing, and further thought. (C) 2017 The Authors. Published by Elsevier B.V.
Conference proceeding
Poster: The Hashed Oct-Tree N-Body Algorithm at a Petaflop
Published 11/2012
2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 1442 - 1442
We have recently demonstrated our hashed oct-tree N-body code (HOT) scaling to 256k processors on Jaguar at Oak Ridge National Laboratory with a performance of 1.79 Petaflops (single precision) on 2 trillion particles. We have additionally performed preliminary studies with NVIDIA Fermi GPUs, achieving single GPU performance on our hexadecapole inner loop near 1 Tflop (single precision) and application performance speedup of 2x by offloading the most computationally intensive part of the code to the GPU.
Conference proceeding
Advances in petascale kinetic plasma simulation with VPIC and Roadrunner
Published 01/01/2009
SCIDAC 2009: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 180, 1, 012055
VPIC [1], a first-principles 3d electromagnetic charge-conserving relativistic kinetic particle-in-cell code, was recently adapted to run on Los Alamos's Roadrunner [2], the first supercomputer to break a petaflop (10(15) floating point operations per second) in the TOP500 supercomputer performance rankings. [3] We summarize VPIC's modeling capabilities, VPIC's optimization techniques and Roadrunner's computational characteristics. We then discuss three applications enabled by VPIC's unprecedented performance on Roadrunner: modeling laser plasma interaction in upcoming inertial confinement fusion experiments at the National Ignition Facility, modeling short-pulse laser GeV ion acceleration and modeling reconnection in space and laboratory plasmas.
Conference proceeding
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on roadrunner
Published 11/2008
2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 1 - 11
We demonstrate the outstanding performance and scalability of the VPIC kinetic plasma modeling code on the heterogeneous IBM Roadrunner supercomputer at Los Alamos National Laboratory. VPIC is a three-dimensional, relativistic, electromagnetic, particle-in-cell (PIC) code that self-consistently evolves a kinetic plasma. VPIC simulations of laser plasma interaction were conducted at unprecedented fidelity and scale-up to 1.0 times 10 12 particles on as many as 136 times 10 6 voxels-to model accurately the particle trapping physics occurring within a laser-driven hohlraum in an inertial confinement fusion experiment. During a parameter study of laser reflectivity as a function of laser intensity under experimentally realizable hohlraum conditions, we measured sustained performance exceeding 0.374 Pflop/s (s.p.) with the inner loop itself achieving 0.488 Pflop/s (s.p.). Given the increasing importance of data motion limitations, it is notable that this was measured in a PIC calculation-a technique that typically requires more data motion per computation than other techniques (such as dense matrix calculations, molecular dynamics N-body calculations and Monte-Carlo calculations) often used to demonstrate supercomputer performance. This capability opens up the exciting possibility of using VPIC to model, from first-principles, an issue critical to the success of the multi-billion dollar DOE/NNSA National Ignition Facility.