Abstract and subjects
This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture (TM) in conjunction with x86 Opteron (TM) processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.