yipdw edited this page Aug 30, 2014 · 3 revisions


There are a few things that ArchiveBot cannot yet correctly fetch. Here is a list of known deficiencies:

  • Most content loaded by Javascript. ArchiveBot's crawler has heuristics to guess at URLs that Javascript may generate, but this is not the same as actually running Javascript. ArchiveBot has an experimental PhantomJS mode that may retrieve more assets loaded in that manner.
  • Video and audio loaded via Flash applets, or Flash applets inserted via Javascript loaders. Flash applets embedded with object are saved, however.
  • Web fonts supplied by Typekit or similar services. ArchiveBot will grab the Javascript loader, but Typekit's loader enforces domain restrictions that forbid the font from actually being loaded.
