Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Vuizur committed Jun 13, 2022
1 parent dff42b5 commit d5473f1
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@ Other projects parsing the German Wiktionary use the XML dump, but this has the

You should download the newest dewiktionarydump-NS0 dump from https://dumps.wikimedia.org/other/enterprise_html/runs/

Then should clone the project, install poetry and then run `poetry install`.

Then should clone the project, install poetry and then run `poetry install`. You can run python files like `poetry run python main.py`. Main.py downloads the basic data from HTML dump. Scrape_category_html.py downloads and parses the Flexion pages that are not included in the HTML dump. Generate_dictionary.py generates a Tabfile dictionary, but this can be changed to one of the many output formats that pyglossary supports.

### Interesting projects

For the English wiktionary I can recommend https://kaikki.org/, which has parsed the Wiktionary data in a great
For the English wiktionary I can recommend https://kaikki.org/, which has parsed the Wiktionary data in a great format and also retains all grammatical information for inflections, among other things.

0 comments on commit d5473f1

Please sign in to comment.