Abstract and subjects
We have recently demonstrated our hashed oct-tree N-body code (HOT) scaling to 256k processors on Jaguar at Oak Ridge National Laboratory with a performance of 1.79 Petaflops (single precision) on 2 trillion particles. We have additionally performed preliminary studies with NVIDIA Fermi GPUs, achieving single GPU performance on our hexadecapole inner loop near 1 Tflop (single precision) and application performance speedup of 2x by offloading the most computationally intensive part of the code to the GPU.