kwnlp_preprocessor is a Python package to help you convert raw Wikimedia data to standard formats.
# Install the pre-commit setup (linters in our case)
pip install pre-commit
pre-commit install
pip install . # This package is not on pypi yet
# or "pip install -e ." to install in editable mode
This code is not battle tested production code. It is mostly used by the R&D team to prototype new ideas using Wikimedia data.