Skip to content

Latest commit

 

History

History
 
 

aws-s3

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Ingesting data from AWS S3

This README provides a brief guide on how to set up Dozer for real-time data ingestion from an AWS S3 bucket. For a more comprehensive tutorial, please refer to our blog post.

Prerequisites

  • AWS account with access to S3 services
  • AWS CLI installed and configured
  • Python installed
  • Dozer installed

Steps

  1. Generate and Upload Data to S3: Use a Python script to generate a dataset and upload it to an S3 bucket.
python create_dataset_and_upload_to_s3.py

If you already have a dataset in your S3 bucket, you can skip this step.

  1. Configure Dozer: Create a YAML configuration file that defines the data sources, transformations, and APIs. Checkout the sample Dozer configuration file dozer-config.yaml that uses AWS S3 connector.
connections:
  - config : !S3Storage
      details:
        access_key_id: {{YOUR_ACCESS_KEY}}
        secret_access_key: {{YOUR_SECRET_KEY}}
        region: {{YOUR_REGION}}
        bucket_name: aws-s3-sample-stock-data-dozer
      tables:
        - !Table
          name: stocks
          config: !CSV
            path: . # path to files or folder inside a bucket
            extension: .csv
    name: s3
  1. Running Dozer: Start Dozer by running the following command in the terminal:

    dozer -c dozer-config.yaml
  2. Querying the Dozer APIs: Query the Dozer endpoints to get the results of your SQL queries. You can query the cache using gRPC or REST.

    Example queries:

    # REST
    curl -X GET http://localhost:8080/analysis/ticker
  3. Append New Data & Query: Dozer automatically detects and ingests new data files added to the bucket. This allows you to process recurring data without changing any configuration. You can upload a new file to the bucket and can see the dozer ingesting the newly uploaded files in console log.

Additional Information

If you encounter any issues or have suggestions, please file an issue in the issue tracker on our Github page or reach out to us on discord.

Happy coding with Dozer!

Contributing

We love contributions! Please check our Contributing Guidelines if you're interested in helping!