Skip to content
Yali Sassoon edited this page Aug 15, 2013 · 4 revisions

HOME > [SNOWPLOW TECHNICAL DOCUMENTATION](Snowplow technical documentation) > [Storage](storage documentation) > Storage in S3

As standard, the [Enrichment Process](The Enrichment Process) outputs Snowplow data to Snowplow event files in S3. These are tab-delimited files that are:

  1. Suitable for uploading data directly into Amazon Redshift or PostgreSQL
  2. Suitable for querying directly using big data tools on EMR

An example: querying the data in Apache Hive

The easiest way to understand the structure of data in S3 is to run some sample queries using something like Apache Hive on EMR. The table definition for the Snowplow event files is given in the repo.

Going forwards, we plan to move from a flat-file structure to storing Snowplow data using Apache Avro

Clone this wiki locally