We now want to visualise our data. It's up to you how to do this.
Below I've provided some basic instructions on connecting Redshift to PowerBI and Google Data Studio.
Feel free to use the default table in Redshift (i.e. reddit) or the newly transformed one we created with dbt (i.e. reddit_transformed).
Google Data Studio is the better option for a personal project, as reports created here can freely and easily be shared.
- Navigate here and follow the setup instructions.
- Click
Create
on the top right, thenReport
- Under
Connect to data
search forAmazon Redshift
- Enter the relevant details and click
Authenticate
- Select your table
You can now feel free to create some visualisations. Some tutorial/guides here. Here's an example of mine:
You can then publicly share your report by navigating to Share > Manage access.
One thing to note... you don't want to keep your Redshift cluster up past 2 months, as it'll incur a cost once the free trial period comes to an end. You also probably don't want the Airflow Docker containers running on your local machine all the time as this will drain resources and memory.
As such, your Redshift-Google Data Studio connection will eventually be broken. If you want to display a dashboard on your resume even after this happens, one option is to download your Redshift as a CSV as use this as the data source in Google Data Studio:
- Run the
download_redshift_to_csv.py
file under theextraction
folder to download your Redshift table as a CSV to your/tmp
folder. Store this CSV somewhere safe. If you want to download the transformed version of your table, you may need to amend this script slightly to include the new table name, as well as the schema. - If you've already created your report in Google, try navigating to File > Make a copy, and select the CSV as the new data source. This should maintain all your existing visualisations.
- You could also refactor the pipeline to use CRON and PostgreSQL. You could leave this pipeline running as long as you want without incurring a charge, and your Google Data Studio report will be continuously updated with new data.
For PowerBI, you'll need to use Windows OS and install PowerBI Desktop. If you're on Mac or Linux, you can consider a virtualisation software like virtualbox to use Windows.
To connect Redshift to PowerBI:
- Create an account with PowerBI. If you don't have a work or school email address, consider setting up an account with a temporary email address, as it won't accept Gmail and other services used for personal accounts.
- Open PowerBI and click
Get Data
. - Search for
Redshift
in the search box and clickConnect
. - Enter your Redshift server/host name, and the name of the database (e.g. dev) and click
OK
. - Enter the username (e.g. awsuser) and password for the database, and then select the relevant table you'd like to load in.
You can now feel free to create some visualisations. Some tutorials/guides here.
or