Tracing and Benchmarking

There are currently several areas that the SSRC is investigating with regard to tracing and benchmarking. These include, but are not limited to, an in depth analysis of usage behavior in a large scale scientific archive, as well as evolutionary trends in a tertiary storage environment.

Status

Usage Behavior Analysis: We analyzed three years of file-level activities from the NCAR mass storage system, providing valuable insight into a large-scale scientific archive with over 1600 users, tens of millions of files, and petabytes of data. Our examination of system usage showed that, while a subset of users were responsible for most of the activity, this activity was widely distributed at the file level. We also show that the physical grouping of files and directories on media can improve archival storage system performance. Based on our observations, we provide suggestions and guidance for both future scientific archival system designs as well as improved tracing of archival activity.

Evolutionary Trends: We conducted an in-depth comparison of file migration activity on the mass storage system (MSS) at the National Center for Atmospheric Research (NCAR) during two periods, one in the early 1990s, and another nearly twenty years later. The findings indicate that archival behavior has shifted towards a write-heavy workload, and that future archives can be more optimized for write activity than previously believed. Furthermore it may be worth considering the value of data being archived when it is stored, since later retrieval is increasingly less likely.

Analysis of Workload Behavior: To provide relevant input for the design of effective long-term data storage systems, we examined the workload behavior of several scientific and historical archives, covering a mixture of purposes, media types, and access models. Our findings show that, for scientific archival storage, files have become larger, but update rates have remained largely unchanged. However, in public content archives, we observed behavior that diverges from the traditional “write-once, read-maybe” behavior of tertiary storage.

The SSRC also maintains a repository of trace data.

Publications

Last modified 23 May 2019