-
Notifications
You must be signed in to change notification settings - Fork 0
3 Scheduling the StorageLoader
HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores > Using the StorageLoader
- Overview
- Scheduling StorageLoader only
- Scheduling EmrEtlRunner and StorageLoader
- Alternatives to cron
- Next steps
Once you have the load process working smoothly, you can schedule a daily (or more frequent) task to automate the storage process.
The standard way of scheduling the load process is as a daily cronjob. We provide two alternative shell scripts for you to use in your scheduling:
- [snowplow-storage-loader.sh] loader-bash - this script just runs the StorageLoader
- [snowplow-runner-and-loader.sh] combo-bash - this script runs the EmrEtlRunner immediately followed by the StorageLoader
The second script is recommended assuming you want to run the StorageLoader immediately after EmrEtlRunner has completed its work.
To consider each scheduling option in turn:
## 2. Scheduling StorageLoader onlyThe shell script [/4-storage/storage-loader/bin/snowplow-runner-and-loader.sh
] loader-bash
runs the StorageLoader app only.
You need to edit this script and update the three variables at the top:
rvm_path=/path/to/.rvm # Typically in the $HOME of the user who installed RVM
LOADER_PATH=/path/to/snowplow/4-storage/snowplow-storage-loader
LOADER_CONFIG=/path/to/your-loader-config.yml
So for example if you installed RVM as the admin
user, then you would set:
rvm_path=/home/admin/.rvm
Now, assuming you're using the excellent cronic cronic as a wrapper for your cronjobs, and that both cronic and Bundler are on your path, you can configure your cronjob like so:
0 6 * * * root cronic /path/to/snowplow/4-storage/bin/snowplow-runner-and-loader.sh
This will run the ETL job daily at 6am, emailing any failures to you via cronic. Please make sure that your Snowplow events have been safely generated and stored in your In Bucket prior to 6am.
## 3. Scheduling EmrEtlRunner and StorageLoaderThe shell script [/4-storage/storage-loader/bin/snowplow-storage-loader.sh
] combo-bash
runs EmrEtlRunner, immediately followed by StorageLoader - i.e. it chains them together. At
Snowplow, this is the scheduling option we use.
If you use this script, you can delete any separate cronjob for the EmrEtlRunner alone.
You need to update this script and update the five variables at the top:
rvm_path=/path/to/.rvm # Typically in the $HOME of the user who installed RVM
RUNNER_PATH=/path/to/snowplow/3-enrich/snowplow-emr-etl-runner
LOADER_PATH=/path/to/snowplow/4-storage/snowplow-storage-loader
RUNNER_CONFIG=/path/to/your-runner-config.yml
LOADER_CONFIG=/path/to/your-loader-config.yml
So for example if you installed RVM as the admin
user, then you would set:
rvm_path=/home/admin/.rvm
Using cronic cronic as a wrapper, and with cronic and Bundler on your path, configure your cronjob like so:
0 4 * * * root cronic /path/to/snowplow/4-storage/bin/snowplow-runner-and-loader.sh
This will run the ETL job and then the database load daily at 4am, emailing any failures to you via cronic.
## 4. Alternatives to cronIn place of cron, you could schedule StorageLoader using a continuous integration server such as Jenkins jenkins, or potentially use the [Windows Task Scheduler] windows-task-scheduler.
These options are explored in a little more detail in the [Scheduling EmrEtlRunner] (3-Scheduling-EmrEtlRunner) guide.
Setup the StorageLoader! Now you are ready to do some analysis!.
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2013 Snowplow Analytics Ltd
HOME > SNOWPLOW SETUP GUIDE > Step 4: Setting up alternative data stores
- [Step 1: Setup a Collector] (setting-up-a-collector)
- [Step 2: Setup a Tracker] (setting-up-a-tracker)
- [Step 3: Setup EmrEtlRunner] (setting-up-EmrEtlRunner)
- [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
- [4.1: setup Redshift] (setting-up-redshift)
- [4.2: setup PostgreSQL] (setting-up-postgresql)
- [4.3: installing the StorageLoader] (1-installing-the-storageloader)
- [4.4: using the StorageLoader] (2-using-the-storageloader)
- [4.5: scheduling the StorageLoader] (3-scheduling-the-storageloader)
- [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)
Useful resources