diff --git a/README.md b/README.md index a0058a5..be060d6 100644 --- a/README.md +++ b/README.md @@ -15,9 +15,8 @@ Other projects parsing the German Wiktionary use the XML dump, but this has the You should download the newest dewiktionarydump-NS0 dump from https://dumps.wikimedia.org/other/enterprise_html/runs/ -Then should clone the project, install poetry and then run `poetry install`. - +Then should clone the project, install poetry and then run `poetry install`. You can run python files like `poetry run python main.py`. Main.py downloads the basic data from HTML dump. Scrape_category_html.py downloads and parses the Flexion pages that are not included in the HTML dump. Generate_dictionary.py generates a Tabfile dictionary, but this can be changed to one of the many output formats that pyglossary supports. ### Interesting projects -For the English wiktionary I can recommend https://kaikki.org/, which has parsed the Wiktionary data in a great \ No newline at end of file +For the English wiktionary I can recommend https://kaikki.org/, which has parsed the Wiktionary data in a great format and also retains all grammatical information for inflections, among other things. \ No newline at end of file