You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey this code looks perfect for a research project I'm working on. I downloaded the code through canopy and now I'm just trying to figure out how this code works. Do you have any documentation or file to start reading to understand better?
The text was updated successfully, but these errors were encountered:
I don't have any plans at the moment to develop this project in the immediate future. That said, it is in a usable state, and I've used it myself fairly recently. I'm not familiar with canopy, but if you install it like a normal Python package it will install a command-line tool, wikidump. wikidump -h provides some details on how to use it. When run, wikidump will generate a config file wikidump.cfg in the directory it was run it. This config file contains two paths you will need to amend, 'scratch', where the indexes can be stored, and 'xml_dumps', a path to a directory containing the downloaded xml dumps from Wikipedia. I've personally been using wp-download to download the dumps, so the path that wp-download saves them to is the path you want to set xml_dumps to. After downloading the relevant dumps, do wikidump index, and thereafter you can use wikidump dataset to pull out a dataset. Each of the commands should have a bit of help text, for example wikidump dataset -h. Let me know if you need help with figuring out how to do anything specifically, and I'll see if it can be done under the current implementation.
Hey this code looks perfect for a research project I'm working on. I downloaded the code through canopy and now I'm just trying to figure out how this code works. Do you have any documentation or file to start reading to understand better?
The text was updated successfully, but these errors were encountered: