Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training: Writing Python scripts in Snowflake directly for data analysis #466

Open
Tracked by #103
jkarpen opened this issue Nov 7, 2024 · 5 comments
Open
Tracked by #103
Assignees

Comments

@jkarpen
Copy link

jkarpen commented Nov 7, 2024

This will be a training on doing data analysis within Snowflake using Python scripts, to avoid having to download the data and doing an analysis locally.

@jkarpen jkarpen changed the title Writing Python scripts in Snowflake directly for data analysis (vs downloading the data) - Britt has experience with this Writing Python scripts in Snowflake directly for data analysis Nov 7, 2024
@jkarpen jkarpen changed the title Writing Python scripts in Snowflake directly for data analysis Training: Writing Python scripts in Snowflake directly for data analysis Nov 8, 2024
@jkarpen
Copy link
Author

jkarpen commented Nov 19, 2024

Notes from Mintu on the goals for what this session can look like:

Benefits:

  • It will help to avoid downloading data from snowflake to conduct any analysis using pems data in local machine. This way we will not exhaust our local machine memory as well as avoid data movement from one platform to another
  • It will provide the flexibility to choose any prefer time windows data based on interest and need
  • It will help us to quick data pattern check by plotting the data using various python library such as matplotlib, seaborn, plotly etc. Although the sql worksheet provide some limited visualization option but Python will expand that visualization outside of current boundary.
  • It will help us to deploy any complex machine learning model in snowflake environment that is not doable in sql

The training can be helpful if it includes following but not limited -

  • How to import/update any python library in snowflake environment/worksheet?
  • How to read data from snowflake warehouse?
  • Example of some visualization using data from any schema
  • Example of simple linear regression in Python
  • Any other things that your team interested to share or may be useful for everyone.

@jkarpen
Copy link
Author

jkarpen commented Nov 19, 2024

Note from @ian-r-rose : this training should include discussion that even doing this analysis direct in Python, they will still want to put guardrails around the size of the data being analyzed to avoid incurring high costs since their data is so large. We should include guidance on when it makes sense to do this type of analysis in Snowflake directly vs. another option.

@jkarpen
Copy link
Author

jkarpen commented Nov 22, 2024

We want to do more fact-finding before starting this to better understand why they feel the need to download data for analysis currently, to better understand the problem. We will bring this up in the modeling session on 11/27.

@jkarpen
Copy link
Author

jkarpen commented Nov 27, 2024

Notes from discussion with Mintu and team:
Training could cover:

  1. First in how to work with Snowpark to get started with Snowflake notebooks
  2. Second how to use Snowflake's built-in charts, and doing analysis in notebooks vs worksheets

Focus more on the notebook style since you can have in-line visualizations which may be more what the Caltrans team needs.

@summer-mothwood
Copy link
Contributor

@JamesSLogan Here's a notebook I used to explore some data (this was for trying to figure out if the clearinghouse and data relay server were at parity or not). https://app.snowflake.com/vsb79059/dse_caltrans_pems/#/notebooks/TRANSFORM_DEV.PUBLIC.DATA_RELAY_UNION_TEST_STATIONS_SAMPLE (I did not see a 'share notebook' option anywhere, so lmk if that link doesn't work!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants