Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems

Published as Storage Systems Research Center Technical Report UCSC-SSRC-08-01.

Abstract

As storage systems reach the petabyte scale, it has become increasingly difficult for users and storage administrators to understand and manage their data. File metadata, such as inode and extended attributes are a valuable source of information that can aid in locating and identifying files, and can also facilitate administrative tasks, such as storage provisioning and recovery from backups. Unfortunately, most storage systems have no way to quickly and easily search file metadata at large scale.

To address these issues, we developed Spyglass, a indexing system that efficiently gathers, indexes and queries file metadata in large-scale storage systems. Our analysis of file metadata from real-world workloads showed that metadata has spatial locality in the storage namespace and that the distribution of metadata is highly skewed. Based on these findings, we designed Spyglass to use index partitioning and signature files to quickly prune the file search space. We also developed techniques to efficiently handle index versioning, facilitating both fast update and queries across historical indexes. Experiments on systems with up to 300 million files show that the Spyglass prototype is as much as several thousand times faster than current database solutions while requiring only a fraction of the space.

Publication date:
May 2008

Authors:
Andrew Leung
Minglong Shao
Timothy Bisson
Shankar Pasupathy
Ethan L. Miller

Projects:
Scalable File System Indexing
Ultra-Large Scale Storage

Available media

Full paper text: PDF

Bibtex entry

@techreport{leung-ssrctr0801,
  author       = {Andrew Leung and Minglong Shao and Timothy Bisson and Shankar Pasupathy and Ethan L. Miller},
  title        = {Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems},
  institution  = {University of California, Santa Cruz},
  number       = {UCSC-SSRC-08-01},
  month        = may,
  year         = {2008},
}
Last modified 5 Aug 2020