Compressing Provenance Graphs

Appeared in 3rd USENIX Workshop on the Theory and Practice of Provenance.


The provenance community has built a number of systems to collect provenance, most of which assume that provenance will be retained indefinitely. However, it is not cost-effective to retain provenance information inefficiently. Since provenance can be viewed as a graph, we note the similarities to web graphs and draw upon techniques from the web compression domain to provide our own novel and improved graph compression solutions for provenance graphs. Our preliminary results show that adapting web compression techniques results in a compression ratio of 2.12:1 to 2.71:1, which we can improve upon to reach ratios of up to 3.31:1.

Publication date:
June 2011

Yulai Xie
Kiran-Kumar Muniswamy-Reddy
Darrell D. E. Long
Ahmed Amer
Dan Feng
Zhipeng Tan

Archival Storage
Dynamic Non-Hierarchical File Systems
Ultra-Large Scale Storage

Available for download:

Full text:
Download as PDF

Bibtex entry

  author       = {Yulai Xie and Kiran-Kumar Muniswamy-Reddy and Darrell D. E. Long
and Ahmed Amer and Dan Feng and Zhipeng Tan},
  title        = {Compressing Provenance Graphs},
  booktitle = {3rd USENIX Workshop on the Theory and Practice of Provenance},
  month        = jun,
  year         = {2011},
Last modified 12 Oct 2011