Duplicate Data Elimination in a SAN File System

Appeared in Proceedings of the Twenty-first Symposium on Mass Storage Systems (MSS).

Abstract

       

Duplicate Data Elimination (DDE) is our method for identifying and coalescing identical data blocks in Storage Tank, a SAN file system. On-line file systems pose a unique set of performance and implementation challenges for this feature. Existing techniques, which are used to improve both storage and network utilization, do not satisfy these constraints. Our design employs a combination of content hashing, copy-on-write, and lazy updates to achieve its functional and performance goals. DDE executes primarily as a background process. The design also builds on Storage Tank’s FlashCopy function to ease implementation.1

We include an analysis of selected real-world data sets that is aimed at demonstrating the space-saving potential of coalescing duplicate data. Our results show that DDE can reduce storage consumption by up to 80% in some application environments. The analysis explores several additional features, such as the impact of varying file block size and the contribution of whole file duplication to the net savings.

Publication date:
April 2004

Authors:
Bo Hong
Demyn Plantenberg
Darrell D. E. Long
Miriam Sivan-Zimet

Projects:
Secure Networks

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{MSST-Hong-2004,
  author       = {Bo Hong and Demyn Plantenberg and Darrell D. E. Long and Miriam Sivan-Zimet},
  title        = {Duplicate Data Elimination in a {SAN} File System},
  booktitle    = {Proceedings of the Twenty-first Symposium on Mass Storage Systems (MSS)},
  month        = apr,
  year         = {2004},
}
Last modified 15 Jan 2023