Skip to content

Latest commit

 

History

History
55 lines (33 loc) · 3.67 KB

visualisation.md

File metadata and controls

55 lines (33 loc) · 3.67 KB

Data Visualisation

We now want to visualise our data. It's up to you how to do this.

Below I've provided some basic instructions on connecting Redshift to PowerBI and Google Data Studio.

Feel free to use the default table in Redshift (i.e. reddit) or the newly transformed one we created with dbt (i.e. reddit_transformed).

Google Data Studio is the better option for a personal project, as reports created here can freely and easily be shared.

Google Data Studio

  1. Navigate here and follow the setup instructions.
  2. Click Create on the top right, then Report
  3. Under Connect to data search for Amazon Redshift
  4. Enter the relevant details and click Authenticate
  5. Select your table

You can now feel free to create some visualisations. Some tutorial/guides here. Here's an example of mine:

You can then publicly share your report by navigating to Share > Manage access.

What to do once resources are terminated

One thing to note... you don't want to keep your Redshift cluster up past 2 months, as it'll incur a cost once the free trial period comes to an end. You also probably don't want the Airflow Docker containers running on your local machine all the time as this will drain resources and memory.

As such, your Redshift-Google Data Studio connection will eventually be broken. If you want to display a dashboard on your resume even after this happens, one option is to download your Redshift as a CSV as use this as the data source in Google Data Studio:

  1. Run the download_redshift_to_csv.py file under the extraction folder to download your Redshift table as a CSV to your /tmp folder. Store this CSV somewhere safe. If you want to download the transformed version of your table, you may need to amend this script slightly to include the new table name, as well as the schema.
  2. If you've already created your report in Google, try navigating to File > Make a copy, and select the CSV as the new data source. This should maintain all your existing visualisations.
  3. You could also refactor the pipeline to use CRON and PostgreSQL. You could leave this pipeline running as long as you want without incurring a charge, and your Google Data Studio report will be continuously updated with new data.

PowerBI

For PowerBI, you'll need to use Windows OS and install PowerBI Desktop. If you're on Mac or Linux, you can consider a virtualisation software like virtualbox to use Windows.

To connect Redshift to PowerBI:

  1. Create an account with PowerBI. If you don't have a work or school email address, consider setting up an account with a temporary email address, as it won't accept Gmail and other services used for personal accounts.
  2. Open PowerBI and click Get Data.
  3. Search for Redshift in the search box and click Connect.
  4. Enter your Redshift server/host name, and the name of the database (e.g. dev) and click OK.
  5. Enter the username (e.g. awsuser) and password for the database, and then select the relevant table you'd like to load in.

You can now feel free to create some visualisations. Some tutorials/guides here.


Previous Step | Next Step

or

Back to main README