This Library #2

MrWolvwxyz · 2013-11-06T05:23:40Z

Hey this code looks perfect for a research project I'm working on. I downloaded the code through canopy and now I'm just trying to figure out how this code works. Do you have any documentation or file to start reading to understand better?

saffsd · 2013-11-07T01:20:20Z

I don't have any plans at the moment to develop this project in the immediate future. That said, it is in a usable state, and I've used it myself fairly recently. I'm not familiar with canopy, but if you install it like a normal Python package it will install a command-line tool, wikidump. wikidump -h provides some details on how to use it. When run, wikidump will generate a config file wikidump.cfg in the directory it was run it. This config file contains two paths you will need to amend, 'scratch', where the indexes can be stored, and 'xml_dumps', a path to a directory containing the downloaded xml dumps from Wikipedia. I've personally been using wp-download to download the dumps, so the path that wp-download saves them to is the path you want to set xml_dumps to. After downloading the relevant dumps, do wikidump index, and thereafter you can use wikidump dataset to pull out a dataset. Each of the commands should have a bit of help text, for example wikidump dataset -h. Let me know if you need help with figuring out how to do anything specifically, and I'll see if it can be done under the current implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This Library #2

This Library #2

MrWolvwxyz commented Nov 6, 2013

saffsd commented Nov 7, 2013

This Library #2

This Library #2

Comments

MrWolvwxyz commented Nov 6, 2013

saffsd commented Nov 7, 2013