forked from sbober/levitation-perl
-
Notifications
You must be signed in to change notification settings - Fork 1
perl port of scy's levitation
License
jkingdon/levitation-perl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a Perl port of scy's levitation. It reads MediaWiki dump files revision by revision and writes a data stream to stdout suitable for git fast-import. The first 1000 pages of the german Wikipedia and all their revisions (about 390000) can be dumped in about 15 min on relatively moderate hardware. INSTALL: You need at least Perl 5.10. The Perl interpreter has to be compiled with threads support. You also need a working C compiler for the inline SHA1 C function. Although gcc 4.3 was specified by old versions of levitation-perl (circa May 2010), gcc 4.5.2 also appears to work (at least for some small wiki dumps). You need zlib, for example the following should work under Debian/Ubuntu: sudo apt-get install zlib1g-dev You need the following modules and their dependencies from CPAN: - Regexp::Common - Inline - JSON::XS - Compress::Raw::Zlib - Carp::Assert - Devel::Size - CDB_File - XML::Bare >= 0.44 - Deep::Hash::Utils - File::Path > 2.04 ? (2.08 is fine; it has to export make_path) Some Linux distributions will already have the first set. Under Debian / Ubuntu the following command should set you: sudo apt-get install libregexp-common-perl \ libinline-perl libjson-xs-perl \ libcompress-raw-zlib-perl libcarp-assert-perl \ libdevel-size-perl For Fedora, the equivalent is sudo yum install perl-Regexp-Common \ perl-Inline perl-JSON-XS \ perl-Compress-Raw-Zlib perl-Carp-Assert For the second set, you may be able to just run: sudo cpan -i CDB_File sudo cpan -i XML::Bare sudo cpan -i Deep::Hash::Utils USAGE: First, initialize a git repository: cd /tmp mkdir blawiki cd blawiki git init (Alternately, go to the working directory of an existing git repository where you want to import the dump, such as one from a previous run of levitation-perl). Then, "levitate". This is a three-step process: cat /path/to/blawiki-dump.xml | /path/to/levitation-perl/step1.pl LC_ALL=C sort rev-table.txt > rev-sorted.txt /path/to/levitation-perl/step2.pl | /path/to/levitation-perl/gfi.pl Alternatively, you can just change to an empty directory and call the "levitate" helper script with a path to a dump as parameter (may be 7z, bz2, gz or xml): mkdir /tmp/blawiki cd /tmp/blawiki /path/to/levitation-perl/levitate /path/to/blawiki-dump... Lots of progress information is printed to standard error, so it may be best to redirect that to a file. Have fun.
About
perl port of scy's levitation
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Perl 98.8%
- Shell 1.2%