Evolutionary Trends in a Supercomputing Tertiary Storage Environment

Appeared in Proceedings of the 20th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2012).

Abstract

Tracking archival usage and data migration in a long term supercomputing system is critical to understanding not only how users’ needs and habits have changed over time, but also how the archive itself evolves in response to these external factors. Yet this type of study has not previously been performed. To address this need, we conducted an in-depth comparison of file migration activity on the mass storage system (MSS) at the National Center for Atmospheric Research (NCAR) during two periods, one in the early 1990s, and another nearly twenty years later. In addition to confirming earlier findings, our analysis turned up three surprising results. First, the read:write ratio went from 2:1 in the earlier trace to 1:2 in the later trace, a reduction of a factor of four in reads relative to writes. Second, only 30% of the current archive was accessed during the three year period of the study, in stark contrast to the 80% seen in the 1992 trace analysis. Third, access latency to the first byte of data actually got slower despite much faster computers and storage devices. These findings indicate that archival behavior has shifted towards a write-heavy workload, and that future archives can be more optimized for write activity than previously believed. Furthermore it may be worth considering the value of data being archived when it is stored, since later retrieval is increasingly less likely.

Publication date:
August 2012

Authors:
Joel Frank
Ethan L. Miller
Ian Adams
Daniel Rosenthal

Projects:
Archival Storage
Tracing and Benchmarking
Ultra-Large Scale Storage

Available for download:

Full text:
Download as PDF

Bibtex entry

@inproceedings{frank12-mascots,
  author       = {Joel Frank and Ethan L. Miller and Ian Adams and Daniel Rosenthal},
  title        = {Evolutionary Trends in a Supercomputing Tertiary Storage
Environment},
  booktitle = {Proceedings of the 20th IEEE International Symposium on Modeling,
Analysis, and Simulation of Computer and Telecommunication
Systems (MASCOTS 2012)},
  month        = aug,
  year         = {2012},
}
Last modified 11 Dec 2013