Skip to content

Latest commit

 

History

History
 
 

01-docker-terraform

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Introduction

We suggest watching videos in the same order as in this document.

The last video (setting up the environment) is optional, but you can check it earlier if you have troubles setting up the environment and following along with the videos.

Docker + Postgres

Code

  • Why do we need Docker
  • Creating a simple "data pipeline" in Docker
  • Running Postgres locally with Docker
  • Using pgcli for connecting to the database
  • Exploring the NY Taxi dataset
  • Ingesting the data into the database
  • Note if you have problems with pgcli, check this video for an alternative way to connect to your database
  • The pgAdmin tool
  • Docker networks

Note: The UI for PgAdmin 4 has changed, please follow the below steps for creating a server:

  • After login to PgAdmin, right click Servers in the left sidebar.
  • Click on Register.
  • Click on Server.
  • The remaining steps to create a server are the same as in the videos.
  • Converting the Jupyter notebook to a Python script
  • Parametrizing the script with argparse
  • Dockerizing the ingestion script
  • Why do we need Docker-compose
  • Docker-compose YAML file
  • Running multiple containers with docker-compose up
  • Adding the Zones table
  • Inner joins
  • Basic data quality checks
  • Left, Right and Outer joins
  • Group by

🎥 Optional: Docker Networing and Port Mapping

Optional: If you have some problems with docker networking, check Port Mapping and Networks in Docker

  • Docker networks
  • Port forwarding to the host environment
  • Communicating between containers in the network
  • .dockerignore file

🎥 Optional: Walk-Through on WSL

Optional: If you are willing to do the steps from "Ingesting NY Taxi Data to Postgres" till "Running Postgres and pgAdmin with Docker-Compose" with Windows Subsystem Linux please check Docker Module Walk-Through on WSL

GCP

🎥 Introduction to GCP (Google Cloud Platform)

Video

Terraform

Code

🎥 Introduction Terraform: Concepts and Overview

🎥 Terraform Basics: Simple one file Terraform Deployment

🎥 Deployment with a Variables File

Configuring terraform and GCP SDK on Windows

Environment setup

For the course you'll need:

  • Python 3 (e.g. installed with Anaconda)
  • Google Cloud SDK
  • Docker with docker-compose
  • Terraform

If you have problems setting up the env, you can check these videos

🎥 GitHub Codespaces

Preparing the environment with GitHub Codespaces

🎥 GCP Cloud VM

Setting up the environment on cloud VM

  • Generating SSH keys
  • Creating a virtual machine on GCP
  • Connecting to the VM with SSH
  • Installing Anaconda
  • Installing Docker
  • Creating SSH config file
  • Accessing the remote machine with VS Code and SSH remote
  • Installing docker-compose
  • Installing pgcli
  • Port-forwarding with VS code: connecting to pgAdmin and Jupyter from the local computer
  • Installing Terraform
  • Using sftp for putting the credentials to the remote machine
  • Shutting down and removing the instance

Homework

Community notes

Did you take notes? You can share them here