Skip to content
dnmilne edited this page Aug 22, 2013 · 2 revisions

###Obtaining the toolkit

Latest version

Previous versions

####SVN Checkout

The absolute latest (and likely broken) code can be checked out using svn

    svn co https://wikipedia-miner.svn.sourceforge.net/svnroot/wikipedia-miner/trunk wikipedia-miner 

Releases can be obtained in a similar way, eg:

    svn co https://wikipedia-miner.svn.sourceforge.net/svnroot/wikipedia-miner/tags/1.2.0 wikipedia-miner 

###Obtaining Wikipedia data

The toolkit requires both the original XML dumps of Wikipedia, and extracted CSV Summaries. Both need to be from the same edition (same language and release date).

If you can't find the edition you want in the table below, you can download the xml dumps directly from the Media Wiki foundation and extract the CSV summaries yourself. Check here for details.

Language Edition CSV Summaries XML Dump
en 22 July 2011 enwiki-20110722-csv.tar.gz enwiki-20110722-pages-articles.xml.bz2
en 1 September 2011 - enwiki-20110901-pages-articles.xml.bz2
de blah - -