Exploring the Reference Management in Parallel De-Duplication

Article Preview

Abstract:

As the explosion of digital information in network, data de-duplication has been widely used in most backup systems to improve space efficiency. When only unique data segments are stored and shared by backup files, the reference information between the files and their data segments is becoming more and more important to track the data usage and reclaim of freed space. However, as the usage of multi-core and many-core, genereal Group Mark-and-Sweep method has some poor performance in concurrent reference updates due to the synchronization overhead. To alleviate this challenge, a Parallel Mark-and-Sweep mechanism has been exploied based on the research of DHT and similrity method. In our experiments with real-world datasets, it has shown a better performance comparing to the sequential method.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

236-239

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] The Digital Universal Decade – Are You Ready ?, An IDC Analyze Report, http: /www. emc. com/collateral/demos/microsites/idc-digital-universe/iview. html.

Google Scholar

[2] Netapp De-duplication (ASIS). http: /www. netapp. com/us/products/platform-os/dedup. html.

Google Scholar

[3] S. Quinlan and S. Dorward. Venti: A new approach to archive data storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST'02), Monterey, CA, Jan. (2002).

Google Scholar

[4] S. Rhea, R. Cox, and A. Pesterev. Fast, inexpensive content-addressed storage in Foundation. In Proceedings of the 1st USENIX ANNUAL Technical Conference (USENIX ATC'08), Boston, MA. June 2008, USENIX.

Google Scholar

[5] Udi Manber. Finding Similar Files in A Large File System. Technical Report TR 93-33, Department of Computer Science, University of Arizona. October 1993, also in Proceedings of the USENIX Winter 1994 Technical Conference, 17-21, (1994).

Google Scholar

[6] F. Guo, P. Efstathopoulos. Building a High-performance De-duplication System. In Proceedings of the 2011 conference on USENIX Annual Technical Conference (ATC'11), USENIX.

Google Scholar

[7] D. Bhagwat, K. Eshghi, D. D. E. long, M, Lillibridge. Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup. In Proceedings of 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'09). London: IEEE press, 2009: 1-9.

DOI: 10.1109/mascot.2009.5366623

Google Scholar