Skip to content
/ mtar Public

Migratory tar: a new tar format that deduplicates better than the original tar format

License

Notifications You must be signed in to change notification settings

xinglin/mtar

Repository files navigation

================================================================================
This is a modified version of tar that supports transformation between tar 
and mtar. The tar format wasn't designed with deduplication in mind and
thus does not deduplicate well. To solve this problem, we propose a 
new format mtar (short for Migratory tar). One can transform a tar file
into mtar, by migrating a tar file. An mtar file can be restored into a tar
file. For more details, please check out our paper (Metadata Considered Harmful ... to Deduplication), published at hotstorage15. 

What has been changed?
  We added a few files that does the migration and restoration. They are 
  migrate.c and restore.c, included in the src dir. Both of them are
  based on extract.c (which implements extraction of a tar file). 
  migrate.c scans a tar file and outputs data blocks. The header blocks
  are stored in a temporary file (.h) and then appended to the 
  output, to build the final mtar file. restore.c works in a similar
  way but in a reverse direction as migration. It reads the first block
  of a mtar file, from which it can extract the number of header blocks. 
  Then, it seeks to the first header block and start to reconstruct 
  the original tar file, by outputing a header block and then data blocks
  for the first data file. This process repeats until all header blocks 
  are processed.  

  In addition, we added filter.c, which removes the padded blocks at the 
  end of a tar file. Padding was not implemented in migration and restoration.

Supported file types
  We only implement and test migration/restoration for tars which 
  contain dirs, regular files and symbolic links. We have not tested other 
  types, such as hard links. 

How to compile it in Ubuntu14?
	If you git clone from my github repository, it will contain a .git dir. This
will turn on all gcc warnings and treat warnings as errors. To turn it off, we can pass ``--disable-gcc-warnings'' to ./configure.
 
     $ ./configure --disable-gcc-warnings --prefix=/usr/local; make -j12; make install


How to use it?

  First, we need to remove padded bytes in a tar file, since migrate/restore
  does not do padding. 

  To remove padded bytes, do
    $ /usr/local/bin/tar --filter -f tar-1.27.1.tar

  It will produce a .tar.f file (.f means filtered)

  To migrate a tar file, do
    $ /usr/local/bin/tar --migrate -f tar-1.27.1.tar.f

  It will produce a .tar.f.m file (tar-1.27.1.tar.f.m).

  To restore a tar file, do
    $ /usr/local/bin/tar --restore -f tar-1.27.1.tar.f.m

  It will produce a .tar.f file (the original tar-1.27.1.tar.f file will be
  overwritten.)

  The restored tar-1.27.1.tar.f should be exactly the same as the 
  tar-1.27.1.tar.f that is created by migration.


NOTE:
  + migrate.c is based on extract.c implementation. It is invoked
    in the same way as --list.
  + blocking-factor: tar writes output in records. Each record is 
    blocking-factor * BLOCKSIZE. By default, blocking-factor is 10. 
    To get an accurate size comparison between migrated tar and tar, we can set 
    blocking-factor to be 1 and then the difference between migrated tar and 
    normal tar will be 1024 bytes (tar ends with two 512 blocks). 

   $ tar c -b 1 -f test.tar test

AUTHORS:
  Xing Lin (University of Utah)
  Email: [email protected]

  If you have any question or find a bug in this software, please feel 
  free to contact me. 

COPYING:
  This software is covered under the same policy as GNU tar 1.27.1. 
  Read COPYING for more details. 

================================================================================
Read README.original for README from the gnu tar distribution.

About

Migratory tar: a new tar format that deduplicates better than the original tar format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published