At Cytora we use NLP to extract and analyse plain text to build our structured information product.
This is the repo for our workshop at PyCon UK. In this repository you will find the step by step tutorial from the workshop on some basic Natural Language Processing tasks using spaCy, a powerful (and super fast) NLP library.
Clone this repo from GitHub and open the directory, on a UNIX machine these actions will look like this.
git clone https://github.com/cytora/pycon-nlp-in-10-lines.git
cd pycon-nlp-in-10-lines
We recommend you to install all the required dependencies in a virtual environment such as virtualenv, however this step could be skipped.
virtualenv -p python3 venv
source venv/bin/activate
If you are using the Miniconda release of Python, you can use conda virtual environments so your virtual environment setup will be slightly different.
conda create --name venv python=3
source activate venv
To install all the required Python dependencies needed in this tutorial, you need to run this command in the cloned directory:
pip install -r requirements.txt
To install the spaCy model you need to run:
sputnik --name spacy --repository-url http://index.spacy.io install en==1.1.0
To run jupyter notebook run:
jupyter notebook
The tutorial has three parts:
- 00_spacy_intro.ipynb - Introduction to spaCy
- 01_pride_and_predjudice.ipynb - Real text analysis (Pride & Predjudice) (blogpost)
- 02_rand_dataset - Open task on RAND dataset (blogpost)