-
Notifications
You must be signed in to change notification settings - Fork 151
Old Tutorial
How to use the dump generator? Here we explain it.
- We use dumpgenerator.py (download), available at the repository.*
If you have further questions, you can send a message to our mailing list. If you detect any error, report an issue. *Be bold!*
- What kind of backups exist?*
- What does a XML dump contain?*
A XML dump may be "current" or "history". First one contains only the last edit for every page. Second one contains the full history (which is better for historical and research purposes).
- What does an image dump contain?*
- How can I make a XML backup?*
If you have no shell access, then use our dump generator. If the wiki you want to backup has API, use this:
* _python dumpgenerator.py --api=http://wikidomain.com/path/to/api.php --xml_
If API is not available, then use this:
* _python dumpgenerator.py --index=http://wikidomain.com/path/to/index.php --xml_
- What if I only want the last version of every page, not the full history?*
- How can I make an images backup?*
If you have no shell access, then use the same method in the question above, but replace _--xml_ with _--images_.
If the wiki you want to backup has API, use this:
* _python dumpgenerator.py --api=http://wikidomain.com/path/to/api.php --images_
If API is not available, then use this:
* _python dumpgenerator.py --index=http://wikidomain.com/path/to/index.php --images_
- How can I back up both XML and images?*
* _python dumpgenerator.py --api=http://wikidomain.com/path/to/api.php --xml --images_
- Which method is better, API or index.php?*
- How can I know the path of the API or index.php?*
If you want the API URL, remove all from _index.php_, and replace it with _api.php_ (in the example would result: _http://www.example.com/.../api.php_). If you see a webpage with the API documentation, then, you did it fine. Otherwise, API can be located in other path or you did it wrong.
If you want the index.php URL, remove only from the '?' symbol.
- Can I resume a dump?*
* _python dumpgenerator.py --api=http://wikidomain.com/path/to/api.php --xml --images --resume --path=dumpdirectory_
- Can I add a delay between server requests?*
- I want to import a XML/image dump, how can I do it?*
- How can I check the XML dump integrity?*
* _grep "`<title>`" `*`.xml -c;grep "`&lt;page&gt;`&quot; `*`.xml &#45;c;grep &quot;`&lt;/page&gt;`" `*`.xml -c;grep "`&lt;revision&gt;`&quot; `*`.xml &#45;c;grep &quot;`&lt;/revision&gt;`" `*`.xml -c___
You have to see something similar to this (not the numbers but the equality between the first 3 numbers and the 2 last ones):
* 580 * 580 * 580 * 5677 * 5677
If your first 3 numbers or your last 2 numbers are different, then, your XML dump is corrupt (it contains one or more unfinished `</page>` or `</revision></title>`). This is not common in small wikis, but large or very large wikis may fail at this due to truncated XML pages while exporting and merging. The solution is to remove the XML dump and resume (re-download, a bit boring, and it can fail again...), or if you don't care to lose the corrupt pages, exclude them with this script (TO DO).
Welcome to the WikiTeam documentation wiki! We are a group dedicated to archiving wikis around the Internet, and you are invited to be part of it! Find out more.
- Main Page
- News
- Tutorial
- Developers docs
- FAQ
- Software
- Collections
- Community
- Research
- SpeedyDeletion
- WikiFarms