Skip to content

Latest commit

 

History

History
402 lines (239 loc) · 17.4 KB

File metadata and controls

402 lines (239 loc) · 17.4 KB

Generate Insights from multiple data sources using Watson Studio

In this Code Pattern, we will Generate Insights by integrating data from multiple data sources like Db2 On Cloud, CSV File, Db2 Warehouse, etc using Watson Studio. Telling a story with data usually involves integrating data from multiple sources. Being able to combine data from multiple sources is essential when performing analysis. Here we worked with a few data sources namely Db2 On Cloud, CSV File and Db2 Warehouse, but the power of Watson Studio is that this technique can be applied to other sources like MySQL databases, IBM Db2 Big SQL, Oracle database, PostgreSQL, Microsoft SQL Server, and many more, no matter the dataset size.

In this pattern, we will demonstrate the methodology with the following usecase. A watch manufacturing company XYZ manufactures five types of watchs, in three different branch locations (Manchester, Glasgow and Madrid) which uses different selling methods (Telephone, Visiting the Store and Online method). The sales data for each of these branches are stored in a different data source ( Db2 on Cloud, Db2 Warehouse and csv files ). We will integrate data from all these sources and put it on a single data source (Db2 warehouse). This integrated data will further be used to derive insights and will be visualized on an embedded dashboard. This will help us in interpreting, which product is performing the best and which branch is performing the best.

When the reader has completed this Code Pattern, they will understand how to:

  • Connect and get data from multiple data sources.
  • Integrate data from multiple data sources.
  • Send integrated data to the Db2 Warehouse.
  • Derive insights and visualize on Watson Embedded Dashboard.

Flow

  1. Extract data from local files (csv file).
  2. Extract data from Db2 on cloud.
  3. Integrate the data in Watson Studio.
  4. Send the data to Db2 Warehouse.
  5. Visualize and derive insights using Embedded dashboard.

Watch the Video

]

Pre-requisites

Steps

Please follow the below to setup and run this code pattern.

  1. Clone the repo

  2. Create Watson services with IBM Cloud

  3. Create the notebook

  4. Add the data from local system

  5. Add the Db2 connection

  6. Add the Db2 Warehouse connection

  7. Update the notebook with credentials and Db2 Warehouse table name

  8. Run the notebook

  9. Visualize and Derive Insights using Embedded Dashboard Analytics

1. Clone the repo

Clone this git repo. Else, in a terminal, run:

$ git clone https://github.com/IBM/generate-insights-from-multiple-data-sources.git

We’ll be using the file data/datasets/Manchester.csv,data/datasets/Madrid.csv and data/datasets/Glasgow.csv

2. Create Watson services with IBM Cloud

Create the following services:

NOTE: Starting September 30, 2019, you will not be able to provision new instances of the Db2 Warehouse on Cloud Entry plan.

Alternate for IBM Cloud Entry plan users:

* Use the Db2 on Cloud instace that you created just now, instead of Db2 Warehouse.

  • Db2 Warehouse: Create an Db2 Warehouse instance on your IBM cloud.
  • Watson Studio: Create a Watson Studio instance on your IBM cloud.

3. Create the notebook

4. Add the data from local system

Add the data to the notebook

  • When you clone this repo, you will find three .csv files in data/datasets/.
  • From your project page in Watson Studio, click Find and Add Data (look for the 10/01 icon) and its Files tab.
  • Click Browse and navigate to data/datasets/ and find Manchester.csv on your computer.
  • Add the file to Object storage.

5. Add the Db2 connection

(i) First load some data on Db2.

  • Launch your Db2 on cloud and click on Load, as shown below.

  • Click on browse files and upload Madrid.csv, as shown below.

  • Choose the default schema and create a table MADRID, as shown below.

  • Now click on next and load the data.

  • Make sure you note down the table name. In my case the table name is TZF04421.MADRID.

(ii) Add the Db2 connection to your notebook.

  • In your project, click Add to project and then select Connection tab, as shown below.

  • It will redirect you to new connection page. Here choose Db2 on Cloud, as shown below.

  • Fill in your username, password, hostname and Database. Leave use a secure gateway unchecked.

  • NOTE: You can get username, password, hostname and Database credentials by creating/clicking New Credentials from your Db2 service instance on cloud, as shown below.

6. Add the Db2 Warehouse connection

NOTE: Starting September 30, 2019, you will not be able to provision new instances of the Db2 Warehouse on Cloud Entry plan.

Alternate for IBM Cloud Entry plan users:

* Use the same 'Db2 on Cloud' instace that you created and follow the steps below,

This methodology is similar to step 5.

(i) First load some data on Db2 Warehouse.

  • Launch your Db2 and click on Load.

  • Click on Browse files and upload Glasgow.csv.

  • Choose the default schema and create a table GLASGOW.

  • Now click on next and load the data.

  • Make sure you note down the table name. In my case the table name is DASH5989.ALL.

(ii) Add the Db2 connection to your notebook.

  • In your project, click Add to project and then select Connection tab.

  • It will redirect you to New connection page. Here choose Db2 Warehouse.

  • Fill in your username, password, hostname and Database. Leave Use a secure gateway unchecked.

  • NOTE: You can get username, password, hostname and Database credentials by creating/clicking New Credentials from your Db2 Warehouse service instance on cloud.

7. Update the notebook with credentials and Db2 Warehouse table name.

Add the data in csv file, to the notebook

  • Select the cell below 2.2 Add the data from local system (csv file) section in the notebook to update the credentials for Object Store.
  • Use Find and Add Data (look for the 10/01 icon) and its Files tab. You should see the file names uploaded earlier. Make sure your active cell is the empty one created earlier.
  • Select Insert to code below Manchester.csv.
  • Click Insert Pandas DataFrame from the drop down menu.

  • After inserting, make sure you change the DataFrame name to df1, as shown below. NOTE: This step is very important.

Add the data in Db2 on Cloud, to the notebook

  • Select the cell below 2.3 Add the data from Db2 and Db2 Warehouse section in the notebook to update the connection credentials for Db2.
  • Use Find and Add Data (look for the 10/01 icon) and its Connections tab. You should see the Db2 name which we earlier connected. Make sure your active cell is the empty one created earlier.
  • Select Insert to code below Db2.
  • Click Insert Pandas DataFrame from the drop down menu.
  • Select the schema in which you created the table.
  • Select MADRID table.

  • After inserting, make sure you change the DataFrame name to df2, as shown below. NOTE: This step is very important.

Similarly, Add the data in Db2 Warehouse, to the notebook

  • Select the cell below 2.3 Add the data from Db2 and Db2 Warehouse2 section in the notebook to update the connection credentials for Db2 Warehouse.
  • Use Find and Add Data (look for the 10/01 icon) and its Connections tab. You should see the Db2 Warehouse name which we earlier connected. Make sure your active cell is the empty one created earlier.
  • Select Insert to code below Db2 Warehouse.
  • Click Insert Pandas DataFrame from the drop down menu.
  • Select the schema in which you created the table.
  • Select GLASGOW table.
  • After inserting, make sure you change the DataFrame name to df3. NOTE: This step is very important.

Add the Db2 Warehouse credentials, to the notebook

  • Select the cell below 2.5 Configure to the Db2 Warehouse section in the notebook to update the connection credentials for Db2 Warehouse.
  • Use Find and Add Data (look for the 10/01 icon) and its Connections tab. You should see the Db2 Warehouse name which we earlier connected. Make sure your active cell is the empty one created earlier.
  • Select Insert to code below Db2 Warehouse.
  • Click Insert Credentials from the drop down menu.
  • If the credentials are written as credential_2 change them to credentials_1. Make sure that the credentials name is credentials_1. NOTE: This step is very important.

  • Make sure you change the schema name DASHXXXXX as shown below in step 3 from the notebook, with the username of your Db2 Warehouse(In capital letters).

8. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

  • A blank, this indicates that the cell has never been executed.
  • A number, this number represents the relative order this code step was executed.
  • A *, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

  • One cell at a time.
    • Select the cell, and then press the Play button in the toolbar.
  • Batch mode, in sequential order.
    • From the Cell menu bar, there are several options available. For example, you can Run All cells in your notebook, or you can Run All Below, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.
  • At a scheduled time.
    • Press the Schedule button located in the top right section of your notebook panel. Here you can schedule your notebook to be executed once at some future time, or repeatedly at your specified interval.

For this Notebook, you can simply Run All cells.

Check whether the table is created.

  • A table will be created on Db2 Warehouse, after you run this python file
  • In my case the table created in Db2 Warehouse is DASH5989.ALL.
  • Go to your Db2 warehouse and check for a file with table name ALL in default schema.
  • Now the data is ready for visualisation.

9. Visualize and Derive Insights using Embedded Dashboard Analytics

Embedded Dashboard Analytics

IBM® Cognos Dashboard Embedded is a new, API-based solution that lets developers easily add end-to-end data visualization capabilities to their applications so users can create visualizations that feel like part of the app. In this pattern we have used this service to consume the derived insights(from db2 warehouse) followed by interactive dashboards which produces visualizations directly from your data in real-time.

Key Features of IBM Cognos Dashboard Embedded are as follows:

  • Live connection to underlying data
  • Interactive dashboards produce visualizations directly from your data in real-time.
  • Smart creation of visualizations
  • Smart data analysis and visualization capabilities help users discover underlying patterns and meanings in their data.
  • Interactive exploration of data
  • Data can be explored using filtering and navigation paths.
  • Embedded in your application
  • Because dashboards are fully embedded, dashboards are integrated into your application's context, keeping users engaged.

Steps to create a Dashboard

(i) Provisioning a Dashboard Embedded analytics service Instance to the Watson Studio Project

  • Click on Add to Project button to add the Embedded Dashboard Analytics service.

(ii) Creating the Embedded Dashboard service and connections

  • Create the New Embedded Dashboard service or if you already have one then from the drop down prompt, select the Dashboard Embedded service from the list. And click on Save button in the bottom.

  • Select the Template for your Dashboard.

  • Click on selected sources + button to connect to the DB2 Warehouse.

  • Connect to the DB2 Warehouse database (DASH5989). See below screenshot for details.

  • After connecting to the DB2 Warehouse database, you will see All as a new datasource under the Selected Sources.

  • Click on All data source and now you can see the column names displayed vertically under All. These columns can be used for dashboard as a source.

(iii) Creating the Dashboard

We will create 2 tabs (Gross Profit and Revenue Analysis & Unit Sold Analysis).

Under Gross Profit and Revenue Analysis tab, we would create 6 widgets.

  • Gross Profit analysis by year widget
  • Gross Profit analysis by Store Location widget
  • Revenue Generated by Order Method
  • Revenue Generated by Store Location
  • Gross Profit Analysis by Order Method
  • Types of Watches sold Analysis by Order Method
Gross Profit analysis by year widget
  • To create this widget, firstly click on the visualizations icon and select and drag the Word Cloud chart to the dashboard pane. See below screenshot for details.

Gross Profit analysis by Store Location widget
  • Add Store Location and Gross Profit to the Word Cloud Chart. See screenshot for details.

Revenue Generated by Order Method widget
  • Add Store Location and Order Method to the Bar chart. See screenshot for details.

Revenue Generated by Store Location
  • Add Store Location and Revenue Generated to the Tree Map chart. See screenshot for details.

(iv) Share the Dashboard link

  • This dashboard once completed with required visualizations can now be shared by clicking on the share url icon to copy the url. See below screenshot for details.

  • Copy the url and you can share to anyone whom you think is relevant. See below screenshot for details.

(v) Analyze the Dashboard

With the help of the above basic steps to build the widgets(charts), one would be able to build desired charts to find valuable insights from multiple data sources. Also, with the interactive dashboards which produces visualizations directly from your data in real-time, one can slice and dice data from various perspectives. For example, the dashboard that we built can view Gross Profit analysis by year, Gross Profit analysis by Store Location, Revenue Generated by Order Method all in the form of charts.This will help us in interpreting, which product is performing the best and which branch is performing the best. Also, this dashboard service now gives the complete insights of the data which is a merge of multiple data sources.

Sample output

Gross Profit Analysis sample output of the dashboard.

  • Unit Sold Analysis sample output of the dashboard.

Troubleshooting

Refer to Troubleshooting.

License

Apache 2.0