Examining Extended and Scientific Metadata for Scalable Index Designs

Appeared in Proceedings of the 6th International Systems and Storage Conference (SYSTOR 2013).

Abstract

The sheer volume of modern data makes manual file management impractical. Search-oriented file systems, where data and metadata are indexed for fast search, are increasingly viewed as a necessity, everywhere from desktops to HPC. However, current techniques have been designed and tested for file system metadata, such as POSIX metadata, and fail to account for the wide variety of metadata users would like to search. In particular, the scientific world has been vocal about a desire to search extended and content metadata. While file system metadata is well characterized by a variety of workload studies, scientific metadata is much less well understood. We characterize scientific metadata, in order to better understand the implications for index design. We demonstrate that previously suggested index structures, such as k-d trees, R-trees, and row major databases, are not well suited to scientific metadata. Finally, we provide suggestions for a system design based on our findings.

Publication date:
June 2013

Authors:
Aleatha Parker-Wood
Brian Madden
Michael McThrow
Darrell D. E. Long
Ian Adams
Avani Wildani

Projects:
Scalable File System Indexing
Dynamic Non-Hierarchical File Systems

Available for download:

Full text:
Download as PDF

Bibtex entry

@inproceedings{parkerwood13-systor,
  author       = {Aleatha Parker-Wood and Brian Madden and Michael McThrow and
Darrell D. E. Long and Ian Adams and Avani Wildani},
  title        = {Examining Extended and Scientific Metadata for Scalable Index
Designs},
  booktitle = {Proceedings of the 6th International Systems and Storage
Conference (SYSTOR 2013)},
  month        = jun,
  year         = {2013},
}
Last modified 8 Nov 2013