Skip to content

Commit

Permalink
Merge pull request #219 from afelix-95/main
Browse files Browse the repository at this point in the history
Update labs 06 and 10
  • Loading branch information
afelix-95 authored Nov 28, 2024
2 parents 8068f9f + 334ec45 commit 2ddc590
Show file tree
Hide file tree
Showing 12 changed files with 33 additions and 33 deletions.
17 changes: 8 additions & 9 deletions Instructions/Labs/04-ingest-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ This lab will take approximately **60** minutes to complete.

Before working with data in Fabric, create a workspace with the Fabric trial enabled.

1. On the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric) at `https://app.fabric.microsoft.com/home?experience=fabric`, select **Synapse Data Engineering**.
1. On the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric) at `https://app.fabric.microsoft.com/home?experience=fabric`, select **Data Engineering**.
1. In the menu bar on the left, select **Workspaces** (the icon looks similar to 🗇).
1. Create a new workspace with a name of your choice, selecting a licensing mode that includes Fabric capacity (*Trial*, *Premium*, or *Fabric*).
1. When your new workspace opens, it should be empty.
Expand All @@ -29,19 +29,19 @@ Before working with data in Fabric, create a workspace with the Fabric trial ena

Now that you have a workspace, it's time to create a data lakehouse into which you will ingest data.

1. In the **Synapse Data Engineering** home page, create a new **Lakehouse** with a name of your choice.
1. In the **Data Engineering** home page, create a new **Lakehouse** with a name of your choice.

After a minute or so, a new lakehouse with no **Tables** or **Files** will be created.

1. On the **Lake view** tab in the pane on the left, in the **...** menu for the **Files** node, select **New subfolder** and create a subfolder named **new_data**.
1. On the **Explorer** pane on the left, in the **...** menu for the **Files** node, select **New subfolder** and create a subfolder named **new_data**.

## Create a pipeline

A simple way to ingest data is to use a **Copy Data** activity in a pipeline to extract the data from a source and copy it to a file in the lakehouse.

1. On the **Home** page for your lakehouse, select **Get data** and then select **New data pipeline**, and create a new data pipeline named **Ingest Sales Data**.
2. If the **Copy Data** wizard doesn't open automatically, select **Copy Data > Use copy assistant** in the pipeline editor page.
3. In the **Copy Data** wizard, on the **Choose a data source** page, type HTTP in the search bar and then select **HTTP** in the **New sources** section.
3. In the **Copy Data** wizard, on the **Choose data source** page, type HTTP in the search bar and then select **HTTP** in the **New sources** section.


![Screenshot of the Choose data source page.](./Images/choose-data-source.png)
Expand All @@ -66,8 +66,7 @@ A simple way to ingest data is to use a **Copy Data** activity in a pipeline to
- **First row as header**: Selected
- **Compression type**: None
7. Select **Preview data** to see a sample of the data that will be ingested. Then close the data preview and select **Next**.
8. On the **Choose data destination** page, select **OneLake data hub** and then select your existing lakehouse.
9. Set the following data destination options, and then select **Next**:
8. On the **Connect to data destination** page, set the following data destination options, and then select **Next**:
- **Root folder**: Files
- **Folder path name**: new_data
- **File name**: sales.csv
Expand Down Expand Up @@ -142,7 +141,7 @@ A simple way to ingest data is to use a **Copy Data** activity in a pipeline to
Now that you've implemented a notebook to transform data and load it into a table, you can incorporate the notebook into a pipeline to create a reusable ETL process.

1. In the hub menu bar on the left select the **Ingest Sales Data** pipeline you created previously.
2. On the **Activities** tab, in the **More activities** list, select **Delete data**. Then position the new **Delete data** activity to the left of the **Copy data** activity and connect its **On completion** output to the **Copy data** activity, as shown here:
2. On the **Activities** tab, in the **All activities** list, select **Delete data**. Then position the new **Delete data** activity to the left of the **Copy data** activity and connect its **On completion** output to the **Copy data** activity, as shown here:

![Screenshot of a pipeline with Delete data and Copy data activities.](./Images/delete-data-activity.png)

Expand Down Expand Up @@ -196,5 +195,5 @@ In this exercise, you've learned how to implement a pipeline in Microsoft Fabric
If you've finished exploring your lakehouse, you can delete the workspace you created for this exercise.

1. In the bar on the left, select the icon for your workspace to view all of the items it contains.
2. In the **...** menu on the toolbar, select **Workspace settings**.
3. In the **General** section, select **Remove this workspace**.
1. Select **Workspace settings** and in the **General** section, scroll down and select **Remove this workspace**.
1. Select **Delete** to delete the workspace.
4 changes: 2 additions & 2 deletions Instructions/Labs/05-dataflows-gen2.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,5 +133,5 @@ If you've finished exploring dataflows in Microsoft Fabric, you can delete the w

1. Navigate to Microsoft Fabric in your browser.
1. In the bar on the left, select the icon for your workspace to view all of the items it contains.
1. In the **...** menu on the toolbar, select **Workspace settings**.
1. In the **General** section, select **Remove this workspace**.
1. Select **Workspace settings** and in the **General** section, scroll down and select **Remove this workspace**.
1. Select **Delete** to delete the workspace.
21 changes: 11 additions & 10 deletions Instructions/Labs/06-data-warehouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This lab will take approximately **30** minutes to complete.

Before working with data in Fabric, create a workspace with the Fabric trial enabled.

1. On the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric) at `https://app.fabric.microsoft.com/home?experience=fabric`, select **Synapse Data Warehouse**.
1. On the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric) at `https://app.fabric.microsoft.com/home?experience=fabric`, select **Data Warehouse**.
1. In the menu bar on the left, select **Workspaces** (the icon looks similar to 🗇).
1. Create a new workspace with a name of your choice, selecting a licensing mode that includes Fabric capacity (*Trial*, *Premium*, or *Fabric*).
1. When your new workspace opens, it should be empty.
Expand All @@ -27,7 +27,7 @@ Before working with data in Fabric, create a workspace with the Fabric trial ena

Now that you have a workspace, it's time to create a data warehouse. The Synapse Data Warehouse home page includes a shortcut to create a new warehouse:

1. In the **Synapse Data Warehouse** home page, create a new **Warehouse** with a name of your choice.
1. In the **Data Warehouse** home page, create a new **Warehouse** with a name of your choice.

After a minute or so, a new warehouse will be created:

Expand All @@ -37,7 +37,7 @@ Now that you have a workspace, it's time to create a data warehouse. The Synapse

A warehouse is a relational database in which you can define tables and other objects.

1. In your new warehouse, select the **Create tables with T-SQL** tile, and replace the default SQL code with the following CREATE TABLE statement:
1. In your new warehouse, select the **T-SQL** tile, and use the following CREATE TABLE statement:

```sql
CREATE TABLE dbo.DimProduct
Expand Down Expand Up @@ -65,9 +65,8 @@ A warehouse is a relational database in which you can define tables and other ob
```

5. Run the new query to insert three rows into the **DimProduct** table.
6. When the query has finished, select the **Data** tab at the bottom of the page in the data warehouse. In the **Explorer** pane, select the **DimProduct** table and verify that the three rows have been added to the table.
6. When the query has finished, in the **Explorer** pane, select the **DimProduct** table and verify that the three rows have been added to the table.
7. On the **Home** menu tab, use the **New SQL Query** button to create a new query. Then copy and paste the Transact-SQL code from `https://raw.githubusercontent.com/MicrosoftLearning/dp-data/main/create-dw.txt` into the new query pane.
<!-- I had to remove the GO command in this query as well -->
8. Run the query, which creates a simple data warehouse schema and loads some data. The script should take around 30 seconds to run.
9. Use the **Refresh** button on the toolbar to refresh the view. Then in the **Explorer** pane, verify that the **dbo** schema in the data warehouse now contains the following four tables:
- **DimCustomer**
Expand All @@ -81,15 +80,17 @@ A warehouse is a relational database in which you can define tables and other ob

A relational data warehouse typically consists of *fact* and *dimension* tables. The fact tables contain numeric measures you can aggregate to analyze business performance (for example, sales revenue), and the dimension tables contain attributes of the entities by which you can aggregate the data (for example, product, customer, or time). In a Microsoft Fabric data warehouse, you can use these keys to define a data model that encapsulates the relationships between the tables.

1. At the bottom of the page in the data warehouse, select the **Model** tab.
1. In the toolbar, select the **Model layouts** button.
2. In the model pane, rearrange the tables in your data warehouse so that the **FactSalesOrder** table is in the middle, like this:

![Screenshot of the data warehouse model page.](./Images/model-dw.png)

> **Note**: The views **frequently_run_queries**, **long_running_queries**, **exec_sessions_history**, and **exec_requests_history** are part of the **queryinsights** schema automatically created by Fabric. It is a feature that provides a holistic view of historical query activity on the SQL analytics endpoint. Since this feature is out of the scope of this exercise, those views should be ignored for now.

3. Drag the **ProductKey** field from the **FactSalesOrder** table and drop it on the **ProductKey** field in the **DimProduct** table. Then confirm the following relationship details:
- **Table 1**: FactSalesOrder
- **From table**: FactSalesOrder
- **Column**: ProductKey
- **Table 2**: DimProduct
- **To table**: DimProduct
- **Column**: ProductKey
- **Cardinality**: Many to one (*:1)
- **Cross filter direction**: Single
Expand Down Expand Up @@ -177,7 +178,7 @@ A data warehouse in Microsoft Fabric has many of the same capabilities you may b

Instead of writing SQL code, you can use the graphical query designer to query the tables in your data warehouse. This experience is similar to Power Query online, where you can create data transformation steps with no code. For more complex tasks, you can use Power Query's M (Mashup) language.
1. On the **Home** menu, select **New visual query**.
1. On the **Home** menu, expand the options under **New SQL query** and select **New visual query**.
1. Drag **FactSalesOrder** onto the **canvas**. Notice that a preview of the table is displayed in the **Preview** pane below.
Expand All @@ -200,7 +201,7 @@ Instead of writing SQL code, you can use the graphical query designer to query t
You can easily visualize the data in either a single query, or in your data warehouse. Before you visualize, hide columns and/or tables that aren't friendly to report designers.

1. In the **Explorer** pane, select the **Model** view.
1. Select the **Model layouts** button.

1. Hide the following columns in your Fact and Dimension tables that are not necessary to create a report. Note that this does not remove the columns from the model, it simply hides them from view on the report canvas.
1. FactSalesOrder
Expand Down
24 changes: 12 additions & 12 deletions Instructions/Labs/10-ingest-notebooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Because you're also working with a sample dataset, the optimization doesn't refl

Before working with data in Fabric, create a workspace with the Fabric trial enabled.

1. On the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric) at `https://app.fabric.microsoft.com/home?experience=fabric`, select **Synapse Data Engineering**.
1. On the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric) at `https://app.fabric.microsoft.com/home?experience=fabric`, select **Data Engineering**.
1. In the menu bar on the left, select **Workspaces** (the icon looks similar to &#128455;).
1. Create a new workspace with a name of your choice, selecting a licensing mode that includes Fabric capacity (*Trial*, *Premium*, or *Fabric*).
1. When your new workspace opens, it should be empty.
Expand All @@ -31,7 +31,7 @@ Before working with data in Fabric, create a workspace with the Fabric trial ena

Start by creating a new lakehouse, and a destination folder in the lakehouse.

1. From your workspace, select **+ New > Lakehouse**, supply a name, and **Create**.
1. From your workspace, select **+ New item > Lakehouse**, supply a name, and **Create**.

> **Note:** It may take a few minutes to create a new lakehouse with no **Tables** or **Files**.
Expand Down Expand Up @@ -86,15 +86,15 @@ Create a new Fabric notebook and connect to external data source with PySpark.
1. Insert the following code into a **new code cell**:

```python
# Declare file name
file_name = "yellow_taxi"
# Declare file name
file_name = "yellow_taxi"

# Construct destination path
output_parquet_path = f"**InsertABFSPathHere**/{file_name}"
print(output_parquet_path)
# Construct destination path
output_parquet_path = f"**InsertABFSPathHere**/{file_name}"
print(output_parquet_path)

# Load the first 1000 rows as a Parquet file
blob_df.limit(1000).write.mode("overwrite").parquet(output_parquet_path)
# Load the first 1000 rows as a Parquet file
blob_df.limit(1000).write.mode("overwrite").parquet(output_parquet_path)
```

1. Add your **RawData** ABFS path and select **&#9655; Run Cell** to write 1000 rows to a yellow_taxi.parquet file.
Expand Down Expand Up @@ -122,7 +122,7 @@ Likely, your data ingestion task doesn't end with only loading a file. Delta tab
filtered_df = raw_df.withColumn("dataload_datetime", current_timestamp())

# Filter columns to exclude any NULL values in storeAndFwdFlag
filtered_df = filtered_df.filter(raw_df["storeAndFwdFlag"].isNotNull())
filtered_df = filtered_df.filter(col("storeAndFwdFlag").isNotNull())

# Load the filtered data into a Delta table
table_name = "yellow_taxi"
Expand Down Expand Up @@ -179,5 +179,5 @@ In this exercise, you have used notebooks with PySpark in Fabric to load data an
When you're finished exploring, you can delete the workspace you created for this exercise.

1. In the bar on the left, select the icon for your workspace to view all of the items it contains.
2. In the **...** menu on the toolbar, select **Workspace settings**.
3. In the **General** section, select **Remove this workspace**.
1. Select **Workspace settings** and in the **General** section, scroll down and select **Remove this workspace**.
1. Select **Delete** to delete the workspace.
Binary file modified Instructions/Labs/Images/copy-data-pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Instructions/Labs/Images/delete-data-activity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Instructions/Labs/Images/dw-relationships.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Instructions/Labs/Images/model-dw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Instructions/Labs/Images/new-data-warehouse.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Instructions/Labs/Images/notebook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Instructions/Labs/Images/pipeline-run.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Instructions/Labs/Images/pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2ddc590

Please sign in to comment.