GPU Behavior on a Large HPC Cluster

Nathan DeBardeleben; Sean Blanchard; Laura Monroe; Phil Romero; Daryl Grunau; Craig Idler; Cornell Wright

doi:10.1007/978-3-642-54420-0_66

Back

Conference proceeding

Peer reviewed

GPU Behavior on a Large HPC Cluster

Nathan DeBardeleben, Sean Blanchard, Laura Monroe, Phil Romero, Daryl Grunau, Craig Idler and Cornell Wright

EURO-PAR 2013: PARALLEL PROCESSING WORKSHOPS, Vol.8374, pp.680-689

Lecture Notes in Computer Science

01/01/2014

DOI: https://doi.org/10.1007/978-3-642-54420-0_66

Abstract and subjects

Computer Science

Computer Science, Theory & Methods

Science & Technology

Technology

We discuss observed characteristics of GPUs deployed as accelerators in an HPC cluster at Los Alamos National Laboratory. GPUs have a very good theoretical FLOPS rate, and are reasonably inexpensive and available, but they are relatively new to HPC, which demands both consistently high performance across nodes and consistently low error rate. We modified a standard acceptance procedure to test GPU performance, error rate and reliability characteristics, and ran the test suite on a Fermi HPC cluster at LANL. We discuss here our methodology for this testing, and present results relevant to the deployment of GPUs in an HPC environment. In this paper we show performance variability, power usage variability (possibly related), and some reliability concerns on the GPUs tested. We argue for rigorous testing of these devices in deployment as a way of characterizing their behavior.

Metrics

3 Record Views

Details

Title: GPU Behavior on a Large HPC Cluster
Authors/Creators - name: Nathan DeBardeleben
Sean Blanchard
Laura Monroe
Phil Romero
Daryl Grunau
Craig Idler
Cornell Wright
Contributors - only name: D A Mey
M Alexander
P Bientinesi
M Cannataro
C Clauss
A Costan
G Kecskemet
C Morin
L Ricci
J Sahuquillo
M Schulz
Scarano
S L Scott
J Weidendorfer
Publication Details: EURO-PAR 2013: PARALLEL PROCESSING WORKSHOPS, Vol.8374, pp.680-689
Series: Lecture Notes in Computer Science
Publisher: Springer Nature
Number of pages: 10
Language: English
Resource Type: Conference proceeding
ISBN: 3642544193; 9783642544194; 9783642544200; 3642544207
DOI: https://doi.org/10.1007/978-3-642-54420-0_66
Publication ISSN: 0302-9743; 1611-3349