This project is about designing a database model for Global Super Store to store their data and do some data analysis in order to achieve business goals. It's the final project of the 7th course Advanced Data Modeling of Meta Database Engineer Professional Certificate on Coursera.
Global Super Store mainly stores data about the orders which have been made by customers. There's information about the order itself such as the customer who made the order, the product requested by the customer and its quantity, the cost which the customer have to pay, and information about shipping. Also, there's information about the customer himself, his name and a detailed address of him. Besides the order and customer information, there's information about the product, its name, category, sub-category, and price.
All those information is stored in ONLY ONE dataset. This makes the data difficult to monitor, analyze, and extract insights! Therefore we need a structured database to store all those information in a simple and easy-to-monitor format.
To well-design the database, we start with the Entity Relationship Diagram. This methodology requires dividing the information in the dataset into entities, and defining the relationship between them.
The dataset contains the following entities:
- Orders
- Customers
- Products
- Sales
- Addresses
- Shipping
Let's start the three-level modeling.
-
Conceptual Data Model:
The 3 basic components of Conceptual Data Model are:
- Entity: A real world thing
- Attribute: Characteristics or properties of an entity
- Relationship: Dependency or association between two entities
-
Logical Data Model:
Logical Data Model contains more information about the attributes of each entity such as the data type, as well as the relationship between each 2 entities.
-
Physical Data Model:
Physical Data Model describes a database-specific implementation of the database model. It contains information about the database columns, keys, constraints, and other RDBMS (Relational Database Management System) features.
-
ER-Diagram of the Data Model
The implementation of the ER-Diagram (Entity Relationship Diagram) of the database on MySQL Workbench.
-
Database implementation using Forward Engineer tool
This tool generates a SQL script which creates the database with its tables, columns and constraints from its ERD, and runs this script to have the database ready in MySQL Server.
This is a part from the code that was generated using Forward Engineer tool:
-- MySQL Workbench Forward Engineering SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0; SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0; SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'; -- ----------------------------------------------------- -- Schema global_super_store -- ----------------------------------------------------- -- ----------------------------------------------------- -- Schema global_super_store -- ----------------------------------------------------- CREATE SCHEMA IF NOT EXISTS `global_super_store` DEFAULT CHARACTER SET utf8 ; USE `global_super_store` ; -- ----------------------------------------------------- -- Table `global_super_store`.`Customers` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `global_super_store`.`Customers` ( `CustomerID` VARCHAR(100) NOT NULL, `FullName` VARCHAR(255) NOT NULL, `Segment` VARCHAR(100) NOT NULL, PRIMARY KEY (`CustomerID`)) ENGINE = InnoDB;
After executing the SQL script, the database will be ready to use on the server.
It's much more easier for data analysts to deal with Dimensional Data Models, as they optimizes the database for fast retrieval of the data.
Dimensional Data Model consists of dimensions, the categorical data which we're interested in and give us the context of the analysis, and facts, the collection of measurement and metrics that any business needs to analyze.
The most common schema that any dimensional data model can be built with is the Star Schema. It has the fact table at its center, and all the dimensions around it.
Here, the fact table is Sales, and the dimensions are divided into Location, Time, and Products
The implementation of the Star Schema on MySQL Workbench: