Abstract and subjects
Modern High-Performance Computing (HPC) data centers routinely store massive data sets resulting in millions of directories and billions of files. To efficiently search and sift through these files and directories we present the Grand Unified File Index (GUFI), a novel file system metadata index that enables both privileged and regular users to rapidly locate and characterize data sets of interest. GUFI uses a hierarchical index that preserves file access permissions such that the index can be securely accessed by users while still enabling efficient, advanced analysis of storage system usage by cluster administrators. Compared with the current state-of-the-art indexing for file system metadata, GUFI is able to provide speedups of 1.5× to 230× for queries executed by administrators on a real production file system namespace. Queries executed by users, which typically cannot rely on cluster-wide indexing, see even greater speedups using GUFI.