S3 Storage

HOME > [SNOWPLOW TECHNICAL DOCUMENTATION](Snowplow technical documentation) > [Storage](storage documentation) > Storage in S3

As standard, the [Enrichment Process](The Enrichment Process) outputs Snowplow data to Snowplow event files in S3. These are tab-delimited files that are:

Suitable for uploading data directly into Amazon Redshift or PostgreSQL
Suitable for querying directly using big data tools on EMR

An example: querying the data in Apache Hive

The easiest way to understand the structure of data in S3 is to run some sample queries using something like Apache Hive on EMR. The table definition for the Snowplow event files is given in the repo.

Going forwards, we plan to move from a flat-file structure to storing Snowplow data using Apache Avro

HOME > [TECHNICAL DOCUMENTATION](Snowplow technical documentation)

1. Trackers
Overview
Javascript Tracker
No-JS Tracker
Lua Tracker
Arduino Tracker

A. Snowplow Tracker Protocol

2. Collectors
Overview
Cloudfront collector
Clojure collector (Elastic Beanstalk)
Scala Stream collector
SnowCannon (node.js)

B. Collector logging formats

3. Enrich
Overview
EmrEtlRunner
Scalding-based Enrichment Process

C. Canonical Snowplow event model

4. Storage
Overview
[Storage in S3](S3 storage)
Storage in Redshift
Storage in PostgreSQL
Storage in Infobright (deprecated)
The StorageLoader

D. Snowplow storage formats (to write)

5. Analytics
Analytics documentation

Common
Artifact repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 Storage

An example: querying the data in Apache Hive

Clone this wiki locally