Skip to content
Nthiki edited this page Sep 9, 2022 · 18 revisions

Running the project for the first time

Pre-requisites

  1. Install VSCode and get it running on your machine
  2. Install Docker on your machine

Getting started

  1. Clone the repo and open it in VSCode.

  2. Open the terminal in VSCode and make sure that you are in the folder: nlp-sdg

  3. Create a new python environment to host the project.

    conda create -n nlp-sdg python==3.8.8 -y
    
  4. Activate your conda environment

    conda activate nlp-sdg
    
  5. Since this is a kedro project, the first thing you will need to do is install kedro in your environment.

    pip install kedro
    

    The first time you do this, you may get some error messages about missing packages. This is expected, do not worry about them, the next step will install them.

  6. Now that you have installed kedro, you must install all the project dependencies.

    pip install -r src/requirements.txt
    
  7. With all your dependencies installed, you can now build the docker container to house your kedro project.

    kedro docker build
    

Here is some additional info regarding kedro-docker

  1. We are now ready to run the pipeline. Before doing that, make sure that you have the kaggle train.csv dataset under the data/01_raw/ folder:
image
  1. Now use the below command to run the dummy pipeline:

    kedro docker run
    
  2. The pipeline will start running and you should see some logs on your terminal, here is some sample output:

image
Clone this wiki locally