-
Notifications
You must be signed in to change notification settings - Fork 62
Downloads
dnmilne edited this page Aug 22, 2013
·
2 revisions
###Obtaining the toolkit
####SVN Checkout
The absolute latest (and likely broken) code can be checked out using svn
svn co https://wikipedia-miner.svn.sourceforge.net/svnroot/wikipedia-miner/trunk wikipedia-miner
Releases can be obtained in a similar way, eg:
svn co https://wikipedia-miner.svn.sourceforge.net/svnroot/wikipedia-miner/tags/1.2.0 wikipedia-miner
###Obtaining Wikipedia data
The toolkit requires both the original XML dumps of Wikipedia, and extracted CSV Summaries. Both need to be from the same edition (same language and release date).
If you can't find the edition you want in the table below, you can download the xml dumps directly from the Media Wiki foundation and extract the CSV summaries yourself. Check here for details.
Language | Edition | CSV Summaries | XML Dump |
---|---|---|---|
en | 22 July 2011 | enwiki-20110722-csv.tar.gz | enwiki-20110722-pages-articles.xml.bz2 |
en | 1 September 2011 | - | enwiki-20110901-pages-articles.xml.bz2 |
de | blah | - | - |