-
Notifications
You must be signed in to change notification settings - Fork 1
Migratory tar: a new tar format that deduplicates better than the original tar format
License
xinglin/mtar
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
================================================================================ This is a modified version of tar that supports transformation between tar and mtar. The tar format wasn't designed with deduplication in mind and thus does not deduplicate well. To solve this problem, we propose a new format mtar (short for Migratory tar). One can transform a tar file into mtar, by migrating a tar file. An mtar file can be restored into a tar file. For more details, please check out our paper (Metadata Considered Harmful ... to Deduplication), published at hotstorage15. What has been changed? We added a few files that does the migration and restoration. They are migrate.c and restore.c, included in the src dir. Both of them are based on extract.c (which implements extraction of a tar file). migrate.c scans a tar file and outputs data blocks. The header blocks are stored in a temporary file (.h) and then appended to the output, to build the final mtar file. restore.c works in a similar way but in a reverse direction as migration. It reads the first block of a mtar file, from which it can extract the number of header blocks. Then, it seeks to the first header block and start to reconstruct the original tar file, by outputing a header block and then data blocks for the first data file. This process repeats until all header blocks are processed. In addition, we added filter.c, which removes the padded blocks at the end of a tar file. Padding was not implemented in migration and restoration. Supported file types We only implement and test migration/restoration for tars which contain dirs, regular files and symbolic links. We have not tested other types, such as hard links. How to compile it in Ubuntu14? If you git clone from my github repository, it will contain a .git dir. This will turn on all gcc warnings and treat warnings as errors. To turn it off, we can pass ``--disable-gcc-warnings'' to ./configure. $ ./configure --disable-gcc-warnings --prefix=/usr/local; make -j12; make install How to use it? First, we need to remove padded bytes in a tar file, since migrate/restore does not do padding. To remove padded bytes, do $ /usr/local/bin/tar --filter -f tar-1.27.1.tar It will produce a .tar.f file (.f means filtered) To migrate a tar file, do $ /usr/local/bin/tar --migrate -f tar-1.27.1.tar.f It will produce a .tar.f.m file (tar-1.27.1.tar.f.m). To restore a tar file, do $ /usr/local/bin/tar --restore -f tar-1.27.1.tar.f.m It will produce a .tar.f file (the original tar-1.27.1.tar.f file will be overwritten.) The restored tar-1.27.1.tar.f should be exactly the same as the tar-1.27.1.tar.f that is created by migration. NOTE: + migrate.c is based on extract.c implementation. It is invoked in the same way as --list. + blocking-factor: tar writes output in records. Each record is blocking-factor * BLOCKSIZE. By default, blocking-factor is 10. To get an accurate size comparison between migrated tar and tar, we can set blocking-factor to be 1 and then the difference between migrated tar and normal tar will be 1024 bytes (tar ends with two 512 blocks). $ tar c -b 1 -f test.tar test AUTHORS: Xing Lin (University of Utah) Email: [email protected] If you have any question or find a bug in this software, please feel free to contact me. COPYING: This software is covered under the same policy as GNU tar 1.27.1. Read COPYING for more details. ================================================================================ Read README.original for README from the gnu tar distribution.
About
Migratory tar: a new tar format that deduplicates better than the original tar format
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published