Simple Python class for performing text pre-processing on a CSV file. Developed as part of the BRAID project at the University of Sheffield.
Install conda on your machine. Miniconda is recommended: https://docs.conda.io/en/latest/miniconda.html
To set up the Conda environment for running preprocessing.py
on a unix machine, follow the steps below:
-
Navigate to the project directory:
cd your-directory-where-the-code-is-stored
-
Create the Conda environment from the
environment.yml
file:conda env create -f environment.yml
-
Activate the Conda environment:
conda activate braid
-
Make
preprocessing.py
executable:chmod +x preprocessing.py
-
Run
preprocessing.py
:python preprocessing.py
preprocessing.py can be run with command line arguments. The following are available:
- -h, --help: Show this help message and exit
- --stop: Applies stopword removal to text.
- --stem: Applies stemming to text. Cannot be used in conjunction with --lemma.
- --lemma: Applies lemmatization to text. Cannot be used in conjunction with --stem.