Logical Data Deletion in High-Performance De-Duplication Backup

Article Preview

Abstract:

Data de-duplication divides backup stream into chunks and eliminates duplicate chunks across the entire system, thus remarkably reduces the storage and bandwidth requirement for backups. However, this technique also introduced many new problems among which the performance problem has been resolved by many of the existing solutions, while the logical data deletion problem is not well studied till now. This paper studied the logical data deletion mechanism in de-duplication backup systems, analyzed the memory overhead of the Bloom filter, which supports both high performance de-duplication and logical data deletion, and proposed a lazy deletion method to minimize the influence of logical data deletion on de-duplication performance.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2519-2523

Citation:

Online since:

June 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] IDC Predictions 2012: Competing for 2020, http://cdn.idc.com/research/Predictions12/ Main/downloads/IDCTOP10Predictions2012.pdf.

Google Scholar

[2] S. Quinlan and S. Dorward, "Venti: A new approach to archival storage," in Proceedings of the First USENIX Conference on File and Storage Technologies (FAST), 2002, p.89–101.

Google Scholar

[3] T. Yang, H. Jiang, D. Feng, Z. Niu, K. Zhou, Y. Wan, DEBAR: a scalable high performance deduplication storage system for backup and archiving, in IEEE International Symposium on Parallel and Distributed Processing, 2010.

DOI: 10.1109/ipdps.2010.5470468

Google Scholar

[4] A. T. Clements, I. Ahmad, M. Vilayannur, J. Li, Decentralized deduplication in san cluster file systems, in Proceedings of the 2009 USENIX Annual Technical Conference, 2009.

Google Scholar

[5] B. Zhu, H. Li, H. Patterson, Avoiding the disk bottleneck in the data domain deduplication file system, in Proceedings of the 6th USENIX Conference on File And Storage Technologies, 2008.

Google Scholar

[6] M. Lillibridge, K. Eshghi, D. Bhagwat, V. D. et al., Sparse indexing: Large scale, inline deduplication using sampling and locality, in Proceedings of the 7th USENIX Conference on File And Storage Technologies, 2009.

Google Scholar

[7] W. Xia, H. Jiang, D. Feng, Y. Hua, Silo: a similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput, in Proceedings of the 2011 USENIX Annual Technical Conference, 2011, p.26–28.

Google Scholar

[8] B. Debnath, S. Sengupta, J. Li, Chunkstash: Speeding up inline storage deduplication using flash memory, in Proceedings of the 8th USENIX Conference on File And Storage Technologies, 2010.

Google Scholar

[9] P. Shilane, M. Huang, G. Wallace, W. Hsu, Wan optimized replication of backup datasets using stream-informed delta compression, in Proceedings of the 10th USENIX Conference on File And Storage Technologies, 2012.

DOI: 10.1145/2385603.2385606

Google Scholar

[10] S. Rhea, R. Cox, and A. Pesterev, "Fast, inexpensive content-addressed storage in foundation," in Proceedings of the 2008 USENIX Annual Technical Conference, Boston, Massachusetts, June 2008, p.143–156.

Google Scholar

[11] A. Broder and M. Mitzenmacher, "Network applications of bloom filters: a survey," Internet Mathematics, vol. 1, pp.485-509, 2005.

DOI: 10.1080/15427951.2004.10129096

Google Scholar