Skip to content
Ane edited this page Aug 2, 2022 · 6 revisions

Welcome to the boulder wiki!

Other pages in this wiki

This wiki contains how-to steps to develop the project, for internal use.

Enable virtual environment

Pyenv

# List virtual environments
pyenv virtualenvs
# activate virtualenv
pyenv activate boulderenv
pyenv deactivate

Analytics

Backend Lambda

Set up NodeJS and AWS CDK. Docker must be running to deploy the Python Lambda function.

export AWS_PROFILE=XXXX
cdk deploy

Front-end

pip install -r requirements.txt

Run Streamlit locally. Set up the AWS profile to access the dataset in AWS:

export AWS_PROFILE=my_profile
streamlit run app.py

Front-end deployment: Heroku

  1. Create app in Heroku and set a name. In this case, bouldern.
  2. Create Dockerfile with specific streamlit commands
  3. Set web and backend in heroku.yml
  4. Create environment variables in Heroku: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, OWM_API from PyOWM

Log in to Heroku from the terminal

# attach project to heroku app
heroku git:remote -a bouldern
# log in with the CLI. docker must be running
heroku container:login

Push and release project in Heroku

# push changes to heroku
heroku container:push web
# release app
heroku container:release web
# check logs
heroku logs --tail

Alternatively, as the way it's set up now, you can connect Heroku to Github so that the commits to the main branch trigger a Heroku deployment.


Resources

Legal info about scraping/crawling

  1. Web Scraping and Crawling Are Perfectly Legal, Right?
  2. robots.txt file doesn't prohibit scraping the main webpage
  3. No prohibitions in AGB or Datenschutzerklärung. No Nutzunsbedingungen

Unify interval data

Issue 23 of the repo: Fixed with this script:

import pandas as pd
df = pd.read_csv('boulderdata.csv')

# change the time and the value. :15 -> :20 and :45 -> :40. remove the :30

df['current_time'] = df['current_time'].apply(lambda x: x.replace(':15', ':20').replace(':45', ':40'))
df = df[~df['current_time'].str.contains(':30')]

df = df.set_index('current_time')
df.to_csv('boulderdata.csv')