Home

Running the project for the first time

Clone the repo and open it in VSCode.
Open the terminal in VSCode and make sure that you are in the folder: nlp-sdg
Create a new python environment to host the project.
```
conda create -n nlp-sdg python==3.8.8 -y
```
Activate your conda environment
```
conda activate nlp-sdg
```
Since this is a kedro project, the first thing you will need to do is install kedro in your environment.
```
pip install kedro
```
The first time you do this, you may get some error messages about missing packages. This is expected, do not worry about them, the next step will install them.
Now that you have installed kedro, you must install all the project dependencies.
```
pip install -r src/requirements.txt
```
We are now ready to build and run the pipeline. Before doing that, make sure that you have the kaggle train.csv dataset under the data/01_raw/ folder:

With all your dependencies installed and the training dataset in the correct folder, you can now build the docker container to house your kedro project.
```
kedro docker build
```

Here is some additional info regarding kedro-docker

Now use the below command to run the dummy pipeline:
```
kedro docker run
```
The pipeline will start running and you should see some logs on your terminal, here is some sample output: