We have several active and past projects in archival storage, all of which are contributing to the ability to build more efficient, reliable, and secure long-term storage systems. In addition, we maintain a wiki page with links to resources on archival storage systems.
- Archival Workload Studies: We have produced several detailed studies of archival storage user behavior and system evolution. Our studies provide relevant, up-to-date observations on archival system usage patterns to guide and validate future archival storage designs. Some of the key results we've found include weakening oft-quoted "Write-Once, Read-Maybe" assumption, and identifying that the vast majority of archival traffic comes from purely automated sources.
- Improving Trace Analysis: Our experiences with analyzing long-term traces have highlighted shortcoming in current tracing and analysis techniques. We are using our experience to design new techniques and "best practices" to improve future traces and analyses, such as using traces and metadata snapshots to improve understanding of system state over time, and techniques for discerning between logger failures and full system crashes when activity rates appear unusually low.
- Economic Modeling of Long-Term Storage: Understanding economics of long-term preservation is necessary because of tremendous data growth, storage density growth slow down, uncertainty in financial investment market conditions, and increasing need for data preservation. Current business models rely on continuous storage density growth and hence cost-per-byte decline. Given the storage density growth slow down, there is a need to reconsider using disks for long-term preservation. Despite their low upfront cost, disks are expensive in long-term because of their high operational costs. It's time we look for alternative technologies which are more cost-effective than disk in long-term.
- Secure and Searchable Long-Term Storage: As humanity generates ever-increasing amounts of data that must be stored for decades, we must both protect the data from disclosure and allow users to find information. Since long-term storage can potentially suffer from compromised by a single site or person, we distribute data across multiple archive sites, using techniques derived from POTSHARDS. We are investigating techniques that can then allow this data to be searched without revealing search terms or even significant correlation between documents to archive managers, providing a level of privacy necessary for long-term storage of medical records, sensitive corporate and government data, and personal information such as video and photos.
Logan: A management system to scalably grow, maintain, and evolve a heterogeneous archival storage system
Computation-Storage Trade-off: Using provenance to reduce storage overhead by storing intermediate and initial inputs and recomputing a dataset on demand
Pergamum: long-term evolvable storage built from intelligent network-attached bricks with both disk and NVRAM such as flash.
Deep Store: building more efficient archival storage using deduplication to take advantage of intra-file and inter-file redundancy.
POTSHARDS: long-term secure storage, which allows the secure preservation of data for decades without relying upon traditional encryption to prevent information leakage.
- Ian Adams
- Kevin Greenan
- Keren Jin
- Brian Madden
- Joerg Meyer
- Daniel Rosenthal
- Mark W. Storer
- Avani Wildani
- Lawrence You