Skip to content
/ dedup Public

Try not to download the same file twice. Improve cache efficiency and speed up downloads.

Notifications You must be signed in to change notification settings

jablko/dedup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 

Repository files navigation

                               M�Me�et�ta�al�li�in�nk�k

   Try not to download the same file twice.  Improve cache efficiency
   and speed up downloads.

   Take standard headers and knowledge about objects in the cache and
   potentially rewrite those headers so that a client will use a URL
   that's already cached instead of one that isn't.  The headers are
   specified in [RFC 6249] (Metalink/HTTP: Mirrors and Hashes) and
   [RFC 3230] (Instance Digests in HTTP) and are sent by various
   download redirectors or content distribution networks.


1�1.�.  W�Wh�ho�o C�Ca�ar�re�es�s?�?

   More important than saving a little bandwidth, this saves users
   from frustration.

   A lot of download sites distribute the same files from many
   different mirrors and users don't know which mirrors are already
   cached.  These sites often present users with a simple download
   button, but the button doesn't predictably access the same mirror,
   or a mirror that's already cached.  To users it seems like the
   download works sometimes (takes seconds) and not others (takes
   hours), which is frustrating.

   An extreme example of this happens when users share a limited,
   possibly unreliable internet connection, as is common in parts of
   Africa for example.

   [How to cache openSUSE repositories with Squid] is another,
   different example of a use case where picking a URL that's already
   cached is valuable.


2�2.�.  W�Wh�ha�at�t i�it�t D�Do�oe�es�s

   When it sees a response with a "Location: ..." header and a
   "Digest: SHA-256=..." header, it checks if the URL in the Location
   header is already cached.  If it isn't, then it tries to find a URL
   that is cached to use instead.  It looks in the cache for some
   object that matches the digest in the Digest header and if it
   succeeds, then it rewites the Location header with that object's
   URL.

   This way a client should get sent to a URL that's already cached
   and won't download the file again.


3�3.�.  H�Ho�ow�w t�to�o U�Us�se�e i�it�t

   Just build the plugin and add it to your plugin.config file.

   The code is distributed along with recent versions of Traffic
   Server, in the plugins/experimental/metalink directory.  To build
   it, pass the --enable-experimental-plugins option to the configure
   script when you build Traffic Server:

   <pre>$ ./configure --enable-experimental-plugins</pre>

   When you're done building Traffic Server, add "metalink.so" to your
   plugin.config file to start using the plugin.


4�4.�.  R�Re�ea�ad�d M�Mo�or�re�e

   More details are on the [wiki page] in the Traffic Server wiki.


   [RFC 6249]    http://tools.ietf.org/html/rfc6249

   [RFC 3230]    http://tools.ietf.org/html/rfc3230

   [How to cache openSUSE repositories with Squid]
                 http://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid

   [wiki page]   https://cwiki.apache.org/confluence/display/TS/Metalink

About

Try not to download the same file twice. Improve cache efficiency and speed up downloads.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published