Skip to content
This repository has been archived by the owner on Aug 28, 2018. It is now read-only.

ibm-cloud-streaming-retail-demo/dsx-streaming-pipeline-mh-to-db2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Introduction

The purpose of this project is to receive invoice data from Message Hub and persist it to a DB2 Warehouse using IBM WDP Streaming Pipelines

Dependencies

Prerequisites

Develop

None of these steps have been developed.

  • Extract dimension tables from OnlineRetail.csv:

    • County: CountryID int | CountryName string
    • Product: StockID int | StockCode string | Dscription string
    • Time: TBC
    • Customer: CustomerID int | CustomerName string (the input data only has customer id, so manually create customer (company) names or use a tool like faker: https://pypi.python.org/pypi/fake-factory/0.2)
    • Store: StoreID int | StoreName string (the input date doesn't have storename - we could just use geographic place names, e.g. Bristol, London, ...)
  • Write python code to extract the above dimension tables and create the DB2 DDL and insert statements.

  • Write fact table DDL, e.g.

    • TransactionID int | CountryID int | ProductID int | TimeID int | CustomerID int | StoreID int | Quantity int | UnitPrice decimal
  • Streaming Pipelines

    • Create a pipeline: Message Hub -> Python Code -> DB2
    • Python code:
      • In the init(state) method, lookup and hold in memory all of the above dimension tables
      • Map each record flowing through to retrieve the ID fields for each of the dimension tables

Description

This project uses type 1 slowly changing dimensions. A future improvement will be to use more sophisticated slowly changing dimensions.

About

Receive invoice data from Message Hub and persist to DB2 Warehouse

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published