+
+This feature is especially useful in helping you stay on top of any upstream changes that could impact the assets you or your stakeholders rely on. It eliminates the need for you and your team to manually check for upstream changes, or for upstream stakeholders to identify and notify impacted users.
+As a user, you can subscribe to and receive notifications about changes such as deprecations, schema changes, changes in ownership, assertions, or incidents. You’ll always been in the know about potential data quality issues so you can proactively manage your data resources.
+
+## Prerequisites
+
+Once you have [configured Slack within your DataHub instance](saas-slack-setup.md), you will be able to subscribe to any Entity in DataHub and begin recieving notifications via DM.
+If you want to create and manage group-level Subscriptions for your team, you will need [the following privileges](../../docs/authorization/roles.md#role-privileges):
+
+- Manage Group Notification Settings
+- Manage Group Subscriptions
+
+## Using DataHub’s Subscriptions and Notifications Feature
+
+The first step is identifying the assets you want to subscribe to.
+DataHub’s [Lineage and Impact Analysis features](../../docs/act-on-metadata/impact-analysis.md#lineage-impact-analysis-setup-prerequisites-and-permissions) can help you identify upstream entities that could impact the assets you use and are responsible for.
+You can use the Subscriptions and Notifications feature to sign up for updates for your entire team, or just for yourself.
+
+### Subscribing Your Team/Group to Notifications
+
+The dropdown menu next to the Subscribe button lets you choose who the subscription is for. To create a group subscription, click on Manage Group Subscriptions.
+
+
+
+
+
+Next, customize the group’s subscriptions by selecting the types of changes you want the group to be notified about.
+
+
+
+
+
+Connect to Slack. Currently, Acryl's Subscriptions and Notifications feature integrates only with Slack. Add your group’s Slack Channel ID to receive notifications on Slack.
+(You can find your Channel ID in the About section of your channel on Slack.)
+
+
+
+
+
+### Individually Subscribing to an Entity
+
+Select the **Subscribe Me** option in the Subscriptions dropdown menu.
+
+
+
+
+
+Pick the updates you want to be notified about, and connect your Slack account by using your Slack Member ID.
+
+
+
+
+
+:::note
+You can find your Slack Member ID in your profile settings.
+
+
+
+
+:::
+
+### Managing Your Subscriptions
+
+You can enable, disable, or manage notifications at any time to ensure that you receive relevant updates.
+
+Simply use the Dropdown menu next to the Subscribe button to unsubscribe from the asset, or to manage/modify your subscription (say, to modify the changes you want to be updated about).
+
+
+
+
+
+You can also view and manage your subscriptions in your DataHub settings page.
+
+
+
+
+
+You can view and manage the group’s subscriptions on the group’s page on DataHub.
+
+
+
+
+
+## FAQ
+
+
+
+What changes can I be notified about using this feature?
+
+You can subscribe to deprecations, Assertion status changes, Incident status changes, Schema changes, Ownership changes, Glossary Term changes, and Tag changes.
+
+
+
+
+
+
+
+What if I no longer want to receive updates about a data asset?
+
+You can unsubscribe from any asset to stop receiving notifications about it. On the asset’s DataHub page, simply use the dropdown menu next to the Subscribe button to unsubscribe from the asset.
+
+
+
+
+
+
+
+
+What if I want to be notified about different changes?
+
+To modify your subscription, use the dropdown menu next to the Subscribe button to modify the changes you want to be notified about.
+
+
+## Reference
+
+- [DataHub Blog - Simplifying Data Monitoring & Management with Subscriptions and Notifications with Acryl DataHub](https://www.acryldata.io/blog/simplifying-data-monitoring-and-management-with-subscriptions-and-notifications-with-acryl-datahub)
+- Video Guide - Getting Started with Subscription & Notifications
+
From f42cb95b928c071b8309cf7c3e9a0fe8b41d3a90 Mon Sep 17 00:00:00 2001
From: Hyejin Yoon <0327jane@gmail.com>
Date: Thu, 2 Nov 2023 17:46:49 +0900
Subject: [PATCH 17/34] docs: unify oidc guides using tabs (#9068)
Co-authored-by: Harshal Sheth
---
docs-website/sidebars.js | 11 +-
.../guides/sso/configure-oidc-behind-proxy.md | 18 +-
.../guides/sso/configure-oidc-react-azure.md | 127 -------
.../guides/sso/configure-oidc-react-google.md | 118 ------
.../guides/sso/configure-oidc-react-okta.md | 124 ------
.../guides/sso/configure-oidc-react.md | 355 +++++++++++++-----
6 files changed, 263 insertions(+), 490 deletions(-)
delete mode 100644 docs/authentication/guides/sso/configure-oidc-react-azure.md
delete mode 100644 docs/authentication/guides/sso/configure-oidc-react-google.md
delete mode 100644 docs/authentication/guides/sso/configure-oidc-react-okta.md
diff --git a/docs-website/sidebars.js b/docs-website/sidebars.js
index ab4c1311d5fc7..9cc035f3e29e0 100644
--- a/docs-website/sidebars.js
+++ b/docs-website/sidebars.js
@@ -171,15 +171,8 @@ module.exports = {
{
"Frontend Authentication": [
"docs/authentication/guides/jaas",
- {
- "OIDC Authentication": [
- "docs/authentication/guides/sso/configure-oidc-react",
- "docs/authentication/guides/sso/configure-oidc-react-google",
- "docs/authentication/guides/sso/configure-oidc-react-okta",
- "docs/authentication/guides/sso/configure-oidc-react-azure",
- "docs/authentication/guides/sso/configure-oidc-behind-proxy",
- ],
- },
+ "docs/authentication/guides/sso/configure-oidc-react",
+ "docs/authentication/guides/sso/configure-oidc-behind-proxy",
],
},
"docs/authentication/introducing-metadata-service-authentication",
diff --git a/docs/authentication/guides/sso/configure-oidc-behind-proxy.md b/docs/authentication/guides/sso/configure-oidc-behind-proxy.md
index c998816e04735..684bf768f2baf 100644
--- a/docs/authentication/guides/sso/configure-oidc-behind-proxy.md
+++ b/docs/authentication/guides/sso/configure-oidc-behind-proxy.md
@@ -1,8 +1,9 @@
-# Configuring Frontend to use a Proxy when communicating with SSO Provider
-*Authored on 22/08/2023*
+# OIDC Proxy Configuration
-The `datahub-frontend-react` server can be configured to use an http proxy when retrieving the openid-configuration.
-This can be needed if your infrastructure is locked down and disallows connectivity by default, using proxies for fine-grained egress control.
+_Authored on 22/08/2023_
+
+The `datahub-frontend-react` server can be configured to use an http proxy when retrieving the openid-configuration.
+This can be needed if your infrastructure is locked down and disallows connectivity by default, using proxies for fine-grained egress control.
## Configure http proxy and non proxy hosts
@@ -17,7 +18,8 @@ HTTP_NON_PROXY_HOSTS=localhost|datahub-gms (or any other hosts that you would li
```
## Optional: provide custom truststore
-If your upstream proxy performs SSL termination to inspect traffic, this will result in different (self-signed) certificates for HTTPS connections.
+
+If your upstream proxy performs SSL termination to inspect traffic, this will result in different (self-signed) certificates for HTTPS connections.
The default truststore used in the `datahub-frontend-react` docker image will not trust these kinds of connections.
To address this, you can copy or mount your own truststore (provided by the proxy or network administrators) into the docker container.
@@ -36,8 +38,8 @@ FROM linkedin/datahub-frontend-react:
COPY /truststore-directory /certificates
```
-Building this Dockerfile will result in your own custom docker image on your local machine.
-You will then be able to tag it, publish it to your own registry, etc.
+Building this Dockerfile will result in your own custom docker image on your local machine.
+You will then be able to tag it, publish it to your own registry, etc.
#### Option b) Mount truststore from your host machine using a docker volume
@@ -51,7 +53,7 @@ Adapt your docker-compose.yml to include a new volume mount in the `datahub-fron
- /truststore-directory:/certificates
```
-### Reference new truststore
+### Reference new truststore
Add the following environment values to the `datahub-frontend-react` container:
diff --git a/docs/authentication/guides/sso/configure-oidc-react-azure.md b/docs/authentication/guides/sso/configure-oidc-react-azure.md
deleted file mode 100644
index 177387327c0e8..0000000000000
--- a/docs/authentication/guides/sso/configure-oidc-react-azure.md
+++ /dev/null
@@ -1,127 +0,0 @@
-# Configuring Azure Authentication for React App (OIDC)
-*Authored on 21/12/2021*
-
-`datahub-frontend` server can be configured to authenticate users over OpenID Connect (OIDC). As such, it can be configured to
-delegate authentication responsibility to identity providers like Microsoft Azure.
-
-This guide will provide steps for configuring DataHub authentication using Microsoft Azure.
-
-:::caution
-Even when OIDC is configured, the root user can still login without OIDC by going
-to `/login` URL endpoint. It is recommended that you don't use the default
-credentials by mounting a different file in the front end container. To do this
-please see [this guide](../jaas.md) to mount a custom user.props file for a JAAS authenticated deployment.
-:::
-
-## Steps
-
-### 1. Create an application registration in Microsoft Azure portal
-
-a. Using an account linked to your organization, navigate to the [Microsoft Azure Portal](https://portal.azure.com).
-
-b. Select **App registrations**, then **New registration** to register a new app.
-
-c. Name your app registration and choose who can access your application.
-
-d. Select `Web` as the **Redirect URI** type and enter the following:
-```
-https://your-datahub-domain.com/callback/oidc
-```
-If you are just testing locally, the following can be used: `http://localhost:9002/callback/oidc`.
-Azure supports more than one redirect URI, so both can be configured at the same time from the **Authentication** tab once the registration is complete.
-
-At this point, your app registration should look like the following:
-
-
-
-
-
-
-
-e. Click **Register**.
-
-### 2. Configure Authentication (optional)
-
-Once registration is done, you will land on the app registration **Overview** tab. On the left-side navigation bar, click on **Authentication** under **Manage** and add extra redirect URIs if need be (if you want to support both local testing and Azure deployments).
-
-
-
-
-
-
-
-Click **Save**.
-
-### 3. Configure Certificates & secrets
-
-On the left-side navigation bar, click on **Certificates & secrets** under **Manage**.
-Select **Client secrets**, then **New client secret**. Type in a meaningful description for your secret and select an expiry. Click the **Add** button when you are done.
-
-**IMPORTANT:** Copy the `value` of your newly create secret since Azure will never display its value afterwards.
-
-
-
-
-
-
-
-### 4. Configure API permissions
-
-On the left-side navigation bar, click on **API permissions** under **Manage**. DataHub requires the following four Microsoft Graph APIs:
-
-1. `User.Read` *(should be already configured)*
-2. `profile`
-3. `email`
-4. `openid`
-
-Click on **Add a permission**, then from the **Microsoft APIs** tab select **Microsoft Graph**, then **Delegated permissions**. From the **OpenId permissions** category, select `email`, `openid`, `profile` and click **Add permissions**.
-
-At this point, you should be looking at a screen like the following:
-
-
-
-
-
-
-
-### 5. Obtain Application (Client) ID
-
-On the left-side navigation bar, go back to the **Overview** tab. You should see the `Application (client) ID`. Save its value for the next step.
-
-### 6. Obtain Discovery URI
-
-On the same page, you should see a `Directory (tenant) ID`. Your OIDC discovery URI will be formatted as follows:
-
-```
-https://login.microsoftonline.com/{tenant ID}/v2.0/.well-known/openid-configuration
-```
-
-### 7. Configure `datahub-frontend` to enable OIDC authentication
-
-a. Open the file `docker/datahub-frontend/env/docker.env`
-
-b. Add the following configuration values to the file:
-
-```
-AUTH_OIDC_ENABLED=true
-AUTH_OIDC_CLIENT_ID=your-client-id
-AUTH_OIDC_CLIENT_SECRET=your-client-secret
-AUTH_OIDC_DISCOVERY_URI=https://login.microsoftonline.com/{tenant ID}/v2.0/.well-known/openid-configuration
-AUTH_OIDC_BASE_URL=your-datahub-url
-AUTH_OIDC_SCOPE="openid profile email"
-```
-
-Replacing the placeholders above with the client id (step 5), client secret (step 3) and tenant ID (step 6) received from Microsoft Azure.
-
-### 9. Restart `datahub-frontend-react` docker container
-
-Now, simply restart the `datahub-frontend-react` container to enable the integration.
-
-```
-docker-compose -p datahub -f docker-compose.yml -f docker-compose.override.yml up datahub-frontend-react
-```
-
-Navigate to your DataHub domain to see SSO in action.
-
-## Resources
-- [Microsoft identity platform and OpenID Connect protocol](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-protocols-oidc/)
\ No newline at end of file
diff --git a/docs/authentication/guides/sso/configure-oidc-react-google.md b/docs/authentication/guides/sso/configure-oidc-react-google.md
deleted file mode 100644
index af62185e6e787..0000000000000
--- a/docs/authentication/guides/sso/configure-oidc-react-google.md
+++ /dev/null
@@ -1,118 +0,0 @@
-# Configuring Google Authentication for React App (OIDC)
-*Authored on 3/10/2021*
-
-`datahub-frontend` server can be configured to authenticate users over OpenID Connect (OIDC). As such, it can be configured to delegate
-authentication responsibility to identity providers like Google.
-
-This guide will provide steps for configuring DataHub authentication using Google.
-
-:::caution
-Even when OIDC is configured, the root user can still login without OIDC by going
-to `/login` URL endpoint. It is recommended that you don't use the default
-credentials by mounting a different file in the front end container. To do this
-please see [this guide](../jaas.md) to mount a custom user.props file for a JAAS authenticated deployment.
-:::
-
-## Steps
-
-### 1. Create a project in the Google API Console
-
-Using an account linked to your organization, navigate to the [Google API Console](https://console.developers.google.com/) and select **New project**.
-Within this project, we will configure the OAuth2.0 screen and credentials.
-
-### 2. Create OAuth2.0 consent screen
-
-a. Navigate to `OAuth consent screen`. This is where you'll configure the screen your users see when attempting to
-log in to DataHub.
-
-b. Select `Internal` (if you only want your company users to have access) and then click **Create**.
-Note that in order to complete this step you should be logged into a Google account associated with your organization.
-
-c. Fill out the details in the App Information & Domain sections. Make sure the 'Application Home Page' provided matches where DataHub is deployed
-at your organization.
-
-
-
-
-
-
-
-Once you've completed this, **Save & Continue**.
-
-d. Configure the scopes: Next, click **Add or Remove Scopes**. Select the following scopes:
-
- - `.../auth/userinfo.email`
- - `.../auth/userinfo.profile`
- - `openid`
-
-Once you've selected these, **Save & Continue**.
-
-### 3. Configure client credentials
-
-Now navigate to the **Credentials** tab. This is where you'll obtain your client id & secret, as well as configure info
-like the redirect URI used after a user is authenticated.
-
-a. Click **Create Credentials** & select `OAuth client ID` as the credential type.
-
-b. On the following screen, select `Web application` as your Application Type.
-
-c. Add the domain where DataHub is hosted to your 'Authorized Javascript Origins'.
-
-```
-https://your-datahub-domain.com
-```
-
-d. Add the domain where DataHub is hosted with the path `/callback/oidc` appended to 'Authorized Redirect URLs'.
-
-```
-https://your-datahub-domain.com/callback/oidc
-```
-
-e. Click **Create**
-
-f. You will now receive a pair of values, a client id and a client secret. Bookmark these for the next step.
-
-At this point, you should be looking at a screen like the following:
-
-
-
-
-
-
-
-Success!
-
-### 4. Configure `datahub-frontend` to enable OIDC authentication
-
-a. Open the file `docker/datahub-frontend/env/docker.env`
-
-b. Add the following configuration values to the file:
-
-```
-AUTH_OIDC_ENABLED=true
-AUTH_OIDC_CLIENT_ID=your-client-id
-AUTH_OIDC_CLIENT_SECRET=your-client-secret
-AUTH_OIDC_DISCOVERY_URI=https://accounts.google.com/.well-known/openid-configuration
-AUTH_OIDC_BASE_URL=your-datahub-url
-AUTH_OIDC_SCOPE="openid profile email"
-AUTH_OIDC_USER_NAME_CLAIM=email
-AUTH_OIDC_USER_NAME_CLAIM_REGEX=([^@]+)
-```
-
-Replacing the placeholders above with the client id & client secret received from Google in Step 3f.
-
-
-### 5. Restart `datahub-frontend-react` docker container
-
-Now, simply restart the `datahub-frontend-react` container to enable the integration.
-
-```
-docker-compose -p datahub -f docker-compose.yml -f docker-compose.override.yml up datahub-frontend-react
-```
-
-Navigate to your DataHub domain to see SSO in action.
-
-
-## References
-
-- [OpenID Connect in Google Identity](https://developers.google.com/identity/protocols/oauth2/openid-connect)
\ No newline at end of file
diff --git a/docs/authentication/guides/sso/configure-oidc-react-okta.md b/docs/authentication/guides/sso/configure-oidc-react-okta.md
deleted file mode 100644
index 320b887a28f16..0000000000000
--- a/docs/authentication/guides/sso/configure-oidc-react-okta.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# Configuring Okta Authentication for React App (OIDC)
-*Authored on 3/10/2021*
-
-`datahub-frontend` server can be configured to authenticate users over OpenID Connect (OIDC). As such, it can be configured to
-delegate authentication responsibility to identity providers like Okta.
-
-This guide will provide steps for configuring DataHub authentication using Okta.
-
-:::caution
-Even when OIDC is configured, the root user can still login without OIDC by going
-to `/login` URL endpoint. It is recommended that you don't use the default
-credentials by mounting a different file in the front end container. To do this
-please see [this guide](../jaas.md) to mount a custom user.props file for a JAAS authenticated deployment.
-:::
-
-## Steps
-
-### 1. Create an application in Okta Developer Console
-
-a. Log in to your Okta admin account & navigate to the developer console
-
-b. Select **Applications**, then **Add Application**, the **Create New App** to create a new app.
-
-c. Select `Web` as the **Platform**, and `OpenID Connect` as the **Sign on method**
-
-d. Click **Create**
-
-e. Under 'General Settings', name your application
-
-f. Below, add a **Login Redirect URI**. This should be formatted as
-
-```
-https://your-datahub-domain.com/callback/oidc
-```
-
-If you're just testing locally, this can be `http://localhost:9002/callback/oidc`.
-
-g. Below, add a **Logout Redirect URI**. This should be formatted as
-
-```
-https://your-datahub-domain.com
-```
-
-h. [Optional] If you're enabling DataHub login as an Okta tile, you'll need to provide the **Initiate Login URI**. You
-can set if to
-
-```
-https://your-datahub-domain.com/authenticate
-```
-
-If you're just testing locally, this can be `http://localhost:9002`.
-
-i. Click **Save**
-
-
-### 2. Obtain Client Credentials
-
-On the subsequent screen, you should see the client credentials. Bookmark the `Client id` and `Client secret` for the next step.
-
-### 3. Obtain Discovery URI
-
-On the same page, you should see an `Okta Domain`. Your OIDC discovery URI will be formatted as follows:
-
-```
-https://your-okta-domain.com/.well-known/openid-configuration
-```
-
-for example, `https://dev-33231928.okta.com/.well-known/openid-configuration`.
-
-At this point, you should be looking at a screen like the following:
-
-
-
-
-
-
-
-
-
-
-
-
-Success!
-
-### 4. Configure `datahub-frontend` to enable OIDC authentication
-
-a. Open the file `docker/datahub-frontend/env/docker.env`
-
-b. Add the following configuration values to the file:
-
-```
-AUTH_OIDC_ENABLED=true
-AUTH_OIDC_CLIENT_ID=your-client-id
-AUTH_OIDC_CLIENT_SECRET=your-client-secret
-AUTH_OIDC_DISCOVERY_URI=https://your-okta-domain.com/.well-known/openid-configuration
-AUTH_OIDC_BASE_URL=your-datahub-url
-AUTH_OIDC_SCOPE="openid profile email groups"
-```
-
-Replacing the placeholders above with the client id & client secret received from Okta in Step 2.
-
-> **Pro Tip!** You can easily enable Okta to return the groups that a user is associated with, which will be provisioned in DataHub, along with the user logging in. This can be enabled by setting the `AUTH_OIDC_EXTRACT_GROUPS_ENABLED` flag to `true`.
-> if they do not already exist in DataHub. You can enable your Okta application to return a 'groups' claim from the Okta Console at Applications > Your Application -> Sign On -> OpenID Connect ID Token Settings (Requires an edit).
->
-> By default, we assume that the groups will appear in a claim named "groups". This can be customized using the `AUTH_OIDC_GROUPS_CLAIM` container configuration.
->
->
-
-
-
-
-
-### 5. Restart `datahub-frontend-react` docker container
-
-Now, simply restart the `datahub-frontend-react` container to enable the integration.
-
-```
-docker-compose -p datahub -f docker-compose.yml -f docker-compose.override.yml up datahub-frontend-react
-```
-
-Navigate to your DataHub domain to see SSO in action.
-
-## Resources
-- [OAuth 2.0 and OpenID Connect Overview](https://developer.okta.com/docs/concepts/oauth-openid/)
diff --git a/docs/authentication/guides/sso/configure-oidc-react.md b/docs/authentication/guides/sso/configure-oidc-react.md
index 1671673c09318..9b4af80bb0ccd 100644
--- a/docs/authentication/guides/sso/configure-oidc-react.md
+++ b/docs/authentication/guides/sso/configure-oidc-react.md
@@ -1,59 +1,201 @@
-# Overview
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# OIDC Authentication
The DataHub React application supports OIDC authentication built on top of the [Pac4j Play](https://github.com/pac4j/play-pac4j) library.
This enables operators of DataHub to integrate with 3rd party identity providers like Okta, Google, Keycloak, & more to authenticate their users.
-When configured, OIDC auth will be enabled between clients of the DataHub UI & `datahub-frontend` server. Beyond this point is considered
-to be a secure environment and as such authentication is validated & enforced only at the "front door" inside datahub-frontend.
+## 1. Register an app with your Identity Provider
-:::caution
-Even if OIDC is configured the root user can still login without OIDC by going
-to `/login` URL endpoint. It is recommended that you don't use the default
-credentials by mounting a different file in the front end container. To do this
-please see [this guide](../jaas.md) to mount a custom user.props file for a JAAS authenticated deployment.
+
+
+
+#### Create a project in the Google API Console
+
+Using an account linked to your organization, navigate to the [Google API Console](https://console.developers.google.com/) and select **New project**.
+Within this project, we will configure the OAuth2.0 screen and credentials.
+
+#### Create OAuth2.0 consent screen
+
+Navigate to **OAuth consent screen**. This is where you'll configure the screen your users see when attempting to
+log in to DataHub. Select **Internal** (if you only want your company users to have access) and then click **Create**.
+Note that in order to complete this step you should be logged into a Google account associated with your organization.
+
+Fill out the details in the App Information & Domain sections. Make sure the 'Application Home Page' provided matches where DataHub is deployed
+at your organization. Once you've completed this, **Save & Continue**.
+
+
+
+
+
+#### Configure the scopes
+
+Next, click **Add or Remove Scopes**. Select the following scope and click **Save & Continue**.
+
+- .../auth/userinfo.email
+- .../auth/userinfo.profile
+- openid
+
+
+
+
+#### Create an application in Okta Developer Console
+
+Log in to your Okta admin account & navigate to the developer console. Select **Applications**, then **Add Application**, the **Create New App** to create a new app.
+Select `Web` as the **Platform**, and `OpenID Connect` as the **Sign on method**.
+
+Click **Create** and name your application under **General Settings** and save.
+
+- **Login Redirect URI** : `https://your-datahub-domain.com/callback/oidc`.
+- **Logout Redirect URI**. `https://your-datahub-domain.com`
+
+
+
+
+
+:::note Optional
+If you're enabling DataHub login as an Okta tile, you'll need to provide the **Initiate Login URI**. You
+can set if to `https://your-datahub-domain.com/authenticate`. If you're just testing locally, this can be `http://localhost:9002`.
:::
-## Provider-Specific Guides
+
+
-1. [Configuring OIDC using Google](configure-oidc-react-google.md)
-2. [Configuring OIDC using Okta](configure-oidc-react-okta.md)
-3. [Configuring OIDC using Azure](configure-oidc-react-azure.md)
+#### Create an application registration in Microsoft Azure portal
-## Configuring OIDC in React
+Using an account linked to your organization, navigate to the [Microsoft Azure Portal](https://portal.azure.com). Select **App registrations**, then **New registration** to register a new app.
-### 1. Register an app with your Identity Provider
+Name your app registration and choose who can access your application.
-To configure OIDC in React, you will most often need to register yourself as a client with your identity provider (Google, Okta, etc). Each provider may
-have their own instructions. Provided below are links to examples for Okta, Google, Azure AD, & Keycloak.
+- **Redirect URI** : Select **Web** as type and enter `https://your-datahub-domain.com/callback/oidc`
-- [Registering an App in Okta](https://developer.okta.com/docs/guides/add-an-external-idp/openidconnect/main/)
-- [OpenID Connect in Google Identity](https://developers.google.com/identity/protocols/oauth2/openid-connect)
-- [OpenID Connect authentication with Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/auth-oidc)
-- [Keycloak - Securing Applications and Services Guide](https://www.keycloak.org/docs/latest/securing_apps/)
+Azure supports more than one redirect URI, so both can be configured at the same time from the **Authentication** tab once the registration is complete.
+At this point, your app registration should look like the following. Finally, click **Register**.
+
+
+
+
-During the registration process, you'll need to provide a login redirect URI to the identity provider. This tells the identity provider
-where to redirect to once they've authenticated the end user.
+:::note Optional
+Once registration is done, you will land on the app registration **Overview** tab.
+On the left-side navigation bar, click on **Authentication** under **Manage** and add extra redirect URIs if need be (if you want to support both local testing and Azure deployments). Finally, click **Save**.
-By default, the URL will be constructed as follows:
+
+
+
-> "http://your-datahub-domain.com/callback/oidc"
+:::
+
+#### Configure Certificates & secrets
+
+On the left-side navigation bar, click on **Certificates & secrets** under **Manage**.
+Select **Client secrets**, then **New client secret**. Type in a meaningful description for your secret and select an expiry. Click the **Add** button when you are done.
+Copy the value of your newly create secret since Azure will never display its value afterwards.
+
+
+
+
+
+#### Configure API permissions
+
+On the left-side navigation bar, click on **API permissions** under **Manage**. DataHub requires the following four Microsoft Graph APIs:
-For example, if you're hosted DataHub at `datahub.myorg.com`, this
-value would be `http://datahub.myorg.com/callback/oidc`. For testing purposes you can also specify localhost as the domain name
-directly: `http://localhost:9002/callback/oidc`
+- User.Read _(should be already configured)_
+- profile
+- email
+- openid
+
+Click on **Add a permission**, then from the **Microsoft APIs** tab select **Microsoft Graph**, then **Delegated permissions**. From the **OpenId permissions** category, select `email`, `openid`, `profile` and click **Add permissions**.
+
+At this point, you should be looking at a screen like the following:
+
+
+
+
+
+
+
+
+## 2. Obtain Client Credentials & Discovery URL
The goal of this step should be to obtain the following values, which will need to be configured before deploying DataHub:
-1. **Client ID** - A unique identifier for your application with the identity provider
-2. **Client Secret** - A shared secret to use for exchange between you and your identity provider
-3. **Discovery URL** - A URL where the OIDC API of your identity provider can be discovered. This should suffixed by
- `.well-known/openid-configuration`. Sometimes, identity providers will not explicitly include this URL in their setup guides, though
- this endpoint *will* exist as per the OIDC specification. For more info see http://openid.net/specs/openid-connect-discovery-1_0.html.
+- **Client ID** - A unique identifier for your application with the identity provider
+- **Client Secret** - A shared secret to use for exchange between you and your identity provider
+- **Discovery URL** - A URL where the OIDC API of your identity provider can be discovered. This should suffixed by
+ `.well-known/openid-configuration`. Sometimes, identity providers will not explicitly include this URL in their setup guides, though
+ this endpoint _will_ exist as per the OIDC specification. For more info see http://openid.net/specs/openid-connect-discovery-1_0.html.
+
+
+
+
+
+**Obtain Client Credentials**
+
+Navigate to the **Credentials** tab. Click **Create Credentials** & select **OAuth client ID** as the credential type.
+
+On the following screen, select **Web application** as your Application Type.
+Add the domain where DataHub is hosted to your 'Authorized Javascript Origins'.
+
+```
+https://your-datahub-domain.com
+```
+
+Add the domain where DataHub is hosted with the path `/callback/oidc` appended to 'Authorized Redirect URLs'. Finally, click **Create**
+
+```
+https://your-datahub-domain.com/callback/oidc
+```
+
+You will now receive a pair of values, a client id and a client secret. Bookmark these for the next step.
+
+
+
+
+**Obtain Client Credentials**
+
+After registering the app, you should see the client credentials. Bookmark the `Client id` and `Client secret` for the next step.
+
+**Obtain Discovery URI**
+
+On the same page, you should see an `Okta Domain`. Your OIDC discovery URI will be formatted as follows:
+
+```
+https://your-okta-domain.com/.well-known/openid-configuration
+```
+
+For example, `https://dev-33231928.okta.com/.well-known/openid-configuration`.
+
+At this point, you should be looking at a screen like the following:
+
+
+
+
+
+
-### 2. Configure DataHub Frontend Server
+**Obtain Application (Client) ID**
-The second step to enabling OIDC involves configuring `datahub-frontend` to enable OIDC authentication with your Identity Provider.
+On the left-side navigation bar, go back to the **Overview** tab. You should see the `Application (client) ID`. Save its value for the next step.
+
+**Obtain Discovery URI**
+
+On the same page, you should see a `Directory (tenant) ID`. Your OIDC discovery URI will be formatted as follows:
+
+```
+https://login.microsoftonline.com/{tenant ID}/v2.0/.well-known/openid-configuration
+```
+
+
+
+
+## 3. Configure DataHub Frontend Server
+
+### Docker
+
+The next step to enabling OIDC involves configuring `datahub-frontend` to enable OIDC authentication with your Identity Provider.
To do so, you must update the `datahub-frontend` [docker.env](../../../../docker/datahub-frontend/env/docker.env) file with the
values received from your identity provider:
@@ -67,22 +209,29 @@ AUTH_OIDC_DISCOVERY_URI=your-provider-discovery-url
AUTH_OIDC_BASE_URL=your-datahub-url
```
-- `AUTH_OIDC_ENABLED`: Enable delegating authentication to OIDC identity provider
-- `AUTH_OIDC_CLIENT_ID`: Unique client id received from identity provider
-- `AUTH_OIDC_CLIENT_SECRET`: Unique client secret received from identity provider
-- `AUTH_OIDC_DISCOVERY_URI`: Location of the identity provider OIDC discovery API. Suffixed with `.well-known/openid-configuration`
-- `AUTH_OIDC_BASE_URL`: The base URL of your DataHub deployment, e.g. https://yourorgdatahub.com (prod) or http://localhost:9002 (testing)
-- `AUTH_SESSION_TTL_HOURS`: The length of time in hours before a user will be prompted to login again. Controls the actor cookie expiration time in the browser. Numeric value converted to hours, default 24.
-- `MAX_SESSION_TOKEN_AGE`: Determines the expiration time of a session token. Session tokens are stateless so this determines at what time a session token may no longer be used and a valid session token can be used until this time has passed. Accepts a valid relative Java date style String, default 24h.
+| Configuration | Description | Default |
+| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------- |
+| AUTH_OIDC_ENABLED | Enable delegating authentication to OIDC identity provider | |
+| AUTH_OIDC_CLIENT_ID | Unique client id received from identity provider | |
+| AUTH_OIDC_CLIENT_SECRET | Unique client secret received from identity provider | |
+| AUTH_OIDC_DISCOVERY_URI | Location of the identity provider OIDC discovery API. Suffixed with `.well-known/openid-configuration` | |
+| AUTH_OIDC_BASE_URL | The base URL of your DataHub deployment, e.g. https://yourorgdatahub.com (prod) or http://localhost:9002 (testing) | |
+| AUTH_SESSION_TTL_HOURS | The length of time in hours before a user will be prompted to login again. Controls the actor cookie expiration time in the browser. Numeric value converted to hours. | 24 |
+| MAX_SESSION_TOKEN_AGE | Determines the expiration time of a session token. Session tokens are stateless so this determines at what time a session token may no longer be used and a valid session token can be used until this time has passed. Accepts a valid relative Java date style String. | 24h |
Providing these configs will cause DataHub to delegate authentication to your identity
provider, requesting the "oidc email profile" scopes and parsing the "preferred_username" claim from
the authenticated profile as the DataHub CorpUser identity.
+:::note
+
+By default, the login callback endpoint exposed by DataHub will be located at `${AUTH_OIDC_BASE_URL}/callback/oidc`. This must **exactly** match the login redirect URL you've registered with your identity provider in step 1.
+
+:::
-> By default, the login callback endpoint exposed by DataHub will be located at `${AUTH_OIDC_BASE_URL}/callback/oidc`. This must **exactly** match the login redirect URL you've registered with your identity provider in step 1.
+### Kubernetes
-In kubernetes, you can add the above env variables in the values.yaml as follows.
+In Kubernetes, you can add the above env variables in the `values.yaml` as follows.
```yaml
datahub-frontend:
@@ -102,20 +251,21 @@ datahub-frontend:
You can also package OIDC client secrets into a k8s secret by running
-```kubectl create secret generic datahub-oidc-secret --from-literal=secret=<>```
+```
+kubectl create secret generic datahub-oidc-secret --from-literal=secret=<>
+```
Then set the secret env as follows.
```yaml
- - name: AUTH_OIDC_CLIENT_SECRET
- valueFrom:
- secretKeyRef:
- name: datahub-oidc-secret
- key: secret
+- name: AUTH_OIDC_CLIENT_SECRET
+ valueFrom:
+ secretKeyRef:
+ name: datahub-oidc-secret
+ key: secret
```
-
-#### Advanced
+### Advanced OIDC Configurations
You can optionally customize the flow further using advanced configurations. These allow
you to specify the OIDC scopes requested, how the DataHub username is parsed from the claims returned by the identity provider, and how users and groups are extracted and provisioned from the OIDC claim set.
@@ -128,23 +278,15 @@ AUTH_OIDC_SCOPE=your-custom-scope
AUTH_OIDC_CLIENT_AUTHENTICATION_METHOD=authentication-method
```
-- `AUTH_OIDC_USER_NAME_CLAIM`: The attribute that will contain the username used on the DataHub platform. By default, this is "email" provided
- as part of the standard `email` scope.
-- `AUTH_OIDC_USER_NAME_CLAIM_REGEX`: A regex string used for extracting the username from the userNameClaim attribute. For example, if
- the userNameClaim field will contain an email address, and we want to omit the domain name suffix of the email, we can specify a custom
- regex to do so. (e.g. `([^@]+)`)
-- `AUTH_OIDC_SCOPE`: a string representing the scopes to be requested from the identity provider, granted by the end user. For more info,
- see [OpenID Connect Scopes](https://auth0.com/docs/scopes/openid-connect-scopes).
-- `AUTH_OIDC_CLIENT_AUTHENTICATION_METHOD`: a string representing the token authentication method to use with the identity provider. Default value
- is `client_secret_basic`, which uses HTTP Basic authentication. Another option is `client_secret_post`, which includes the client_id and secret_id
- as form parameters in the HTTP POST request. For more info, see [OAuth 2.0 Client Authentication](https://darutk.medium.com/oauth-2-0-client-authentication-4b5f929305d4)
-
-Additional OIDC Options:
+| Configuration | Description | Default |
+| -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- |
+| AUTH_OIDC_USER_NAME_CLAIM | The attribute that will contain the username used on the DataHub platform. By default, this is "email" providedas part of the standard `email` scope. | |
+| AUTH_OIDC_USER_NAME_CLAIM_REGEX | A regex string used for extracting the username from the userNameClaim attribute. For example, if the userNameClaim field will contain an email address, and we want to omit the domain name suffix of the email, we can specify a customregex to do so. (e.g. `([^@]+)`) | |
+| AUTH_OIDC_SCOPE | A string representing the scopes to be requested from the identity provider, granted by the end user. For more info, see [OpenID Connect Scopes](https://auth0.com/docs/scopes/openid-connect-scopes). | |
+| AUTH_OIDC_CLIENT_AUTHENTICATION_METHOD | a string representing the token authentication method to use with the identity provider. Default value is `client_secret_basic`, which uses HTTP Basic authentication. Another option is `client_secret_post`, which includes the client_id and secret_id as form parameters in the HTTP POST request. For more info, see [OAuth 2.0 Client Authentication](https://darutk.medium.com/oauth-2-0-client-authentication-4b5f929305d4) | client_secret_basic |
+| AUTH_OIDC_PREFERRED_JWS_ALGORITHM | Can be used to select a preferred signing algorithm for id tokens. Examples include: `RS256` or `HS256`. If your IdP includes `none` before `RS256`/`HS256` in the list of signing algorithms, then this value **MUST** be set. | |
-- `AUTH_OIDC_PREFERRED_JWS_ALGORITHM` - Can be used to select a preferred signing algorithm for id tokens. Examples include: `RS256` or `HS256`. If
-your IdP includes `none` before `RS256`/`HS256` in the list of signing algorithms, then this value **MUST** be set.
-
-##### User & Group Provisioning (JIT Provisioning)
+### User & Group Provisioning (JIT Provisioning)
By default, DataHub will optimistically attempt to provision users and groups that do not already exist at the time of login.
For users, we extract information like first name, last name, display name, & email to construct a basic user profile. If a groups claim is present,
@@ -160,26 +302,30 @@ AUTH_OIDC_EXTRACT_GROUPS_ENABLED=false
AUTH_OIDC_GROUPS_CLAIM=
```
-- `AUTH_OIDC_JIT_PROVISIONING_ENABLED`: Whether DataHub users & groups should be provisioned on login if they do not exist. Defaults to true.
-- `AUTH_OIDC_PRE_PROVISIONING_REQUIRED`: Whether the user should already exist in DataHub when they login, failing login if they are not. This is appropriate for situations in which users and groups are batch ingested and tightly controlled inside your environment. Defaults to false.
-- `AUTH_OIDC_EXTRACT_GROUPS_ENABLED`: Only applies if `AUTH_OIDC_JIT_PROVISIONING_ENABLED` is set to true. This determines whether we should attempt to extract a list of group names from a particular claim in the OIDC attributes. Note that if this is enabled, each login will re-sync group membership with the groups in your Identity Provider, clearing the group membership that has been assigned through the DataHub UI. Enable with care! Defaults to false.
-- `AUTH_OIDC_GROUPS_CLAIM`: Only applies if `AUTH_OIDC_EXTRACT_GROUPS_ENABLED` is set to true. This determines which OIDC claims will contain a list of string group names. Accepts multiple claim names with comma-separated values. I.e: `groups, teams, departments`. Defaults to 'groups'.
+| Configuration | Description | Default |
+| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
+| AUTH_OIDC_JIT_PROVISIONING_ENABLED | Whether DataHub users & groups should be provisioned on login if they do not exist. | true |
+| AUTH_OIDC_PRE_PROVISIONING_REQUIRED | Whether the user should already exist in DataHub when they login, failing login if they are not. This is appropriate for situations in which users and groups are batch ingested and tightly controlled inside your environment. | false |
+| AUTH_OIDC_EXTRACT_GROUPS_ENABLED | Only applies if `AUTH_OIDC_JIT_PROVISIONING_ENABLED` is set to true. This determines whether we should attempt to extract a list of group names from a particular claim in the OIDC attributes. Note that if this is enabled, each login will re-sync group membership with the groups in your Identity Provider, clearing the group membership that has been assigned through the DataHub UI. Enable with care! | false |
+| AUTH_OIDC_GROUPS_CLAIM | Only applies if `AUTH_OIDC_EXTRACT_GROUPS_ENABLED` is set to true. This determines which OIDC claims will contain a list of string group names. Accepts multiple claim names with comma-separated values. I.e: `groups, teams, departments`. | groups |
+## 4. Restart datahub-frontend-react
-Once configuration has been updated, `datahub-frontend-react` will need to be restarted to pick up the new environment variables:
+Once configured, restarting the `datahub-frontend-react` container will enable an indirect authentication flow in which DataHub delegates authentication to the specified identity provider.
```
docker-compose -p datahub -f docker-compose.yml -f docker-compose.override.yml up datahub-frontend-react
```
->Note that by default, enabling OIDC will *not* disable the dummy JAAS authentication path, which can be reached at the `/login`
-route of the React app. To disable this authentication path, additionally specify the following config:
-> `AUTH_JAAS_ENABLED=false`
+Navigate to your DataHub domain to see SSO in action.
-### Summary
+:::caution
+By default, enabling OIDC will _not_ disable the dummy JAAS authentication path, which can be reached at the `/login`
+route of the React app. To disable this authentication path, additionally specify the following config:
+`AUTH_JAAS_ENABLED=false`
+:::
-Once configured, deploying the `datahub-frontend-react` container will enable an indirect authentication flow in which DataHub delegates
-authentication to the specified identity provider.
+## Summary
Once a user is authenticated by the identity provider, DataHub will extract a username from the provided claims
and grant DataHub access to the user by setting a pair of session cookies.
@@ -196,44 +342,45 @@ A brief summary of the steps that occur when the user navigates to the React app
7. DataHub sets session cookies for the newly authenticated user
8. DataHub redirects the user to the homepage ("/")
-## FAQ
+## Troubleshooting
-**No users can log in. Instead, I get redirected to the login page with an error. What do I do?**
+
+No users can log in. Instead, I get redirected to the login page with an error. What do I do?
This can occur for a variety of reasons, but most often it is due to misconfiguration of Single-Sign On, either on the DataHub
-side or on the Identity Provider side.
-
-First, verify that all values are consistent across them (e.g. the host URL where DataHub is deployed), and that no values
-are misspelled (client id, client secret).
+side or on the Identity Provider side.
-Next, verify that the scopes requested are supported by your Identity Provider
-and that the claim (i.e. attribute) DataHub uses for uniquely identifying the user is supported by your Identity Provider (refer to Identity Provider OpenID Connect documentation). By default, this claim is `email`.
+- Verify that all values are consistent across them (e.g. the host URL where DataHub is deployed), and that no values are misspelled (client id, client secret).
+- Verify that the scopes requested are supported by your Identity Provider and that the claim (i.e. attribute) DataHub uses for uniquely identifying the user is supported by your Identity Provider (refer to Identity Provider OpenID Connect documentation). By default, this claim is `email`.
+- Make sure the Discovery URI you've configured (`AUTH_OIDC_DISCOVERY_URI`) is accessible where the datahub-frontend container is running. You can do this by issuing a basic CURL to the address (**Pro-Tip**: you may also visit the address in your browser to check more specific details about your Identity Provider).
+- Check the container logs for the `datahub-frontend` container. This should hopefully provide some additional context around why exactly the login handoff is not working.
-Then, make sure the Discovery URI you've configured (`AUTH_OIDC_DISCOVERY_URI`) is accessible where the datahub-frontend container is running. You
-can do this by issuing a basic CURL to the address (**Pro-Tip**: you may also visit the address in your browser to check more specific details about your Identity Provider).
+If all else fails, feel free to reach out to the DataHub Community on Slack for real-time support.
-Finally, check the container logs for the `datahub-frontend` container. This should hopefully provide some additional context
-around why exactly the login handoff is not working.
+
-If all else fails, feel free to reach out to the DataHub Community on Slack for
-real-time support
-
-
-
-**I'm seeing an error in the `datahub-frontend` logs when a user tries to login**
-```shell
-Caused by: java.lang.RuntimeException: Failed to resolve user name claim from profile provided by Identity Provider. Missing attribute. Attribute: 'email', Regex: '(.*)', Profile: { ...
-```
-**what do I do?**
+
+
+I'm seeing an error in the `datahub-frontend` logs when a user tries to login: Caused by: java.lang.RuntimeException: Failed to resolve user name claim from profile provided by Identity Provider. Missing attribute. Attribute: 'email', Regex: '(.*)', Profile: { ....
+
This indicates that your Identity Provider does not provide the claim with name 'email', which DataHub
uses by default to uniquely identify users within your organization.
-To fix this, you may need to
+To fix this, you may need to
-1. Change the claim that is used as the unique user identifier to something else by changing the `AUTH_OIDC_USER_NAME_CLAIM` (e.g. to "name" or "preferred_username") _OR_
+1. Change the claim that is used as the unique user identifier to something else by changing the `AUTH_OIDC_USER_NAME_CLAIM` (e.g. to "name" or "preferred*username") \_OR*
2. Change the environment variable `AUTH_OIDC_SCOPE` to include the scope required to retrieve the claim with name "email"
-For the `datahub-frontend` container / pod.
+For the `datahub-frontend` container / pod.
+
+
+
+## Reference
-**Pro-Tip**: Check the documentation for your Identity Provider to learn more about the scope claims supported.
+Check the documentation for your Identity Provider to learn more about the scope claims supported.
+
+- [Registering an App in Okta](https://developer.okta.com/docs/guides/add-an-external-idp/openidconnect/main/)
+- [OpenID Connect in Google Identity](https://developers.google.com/identity/protocols/oauth2/openid-connect)
+- [OpenID Connect authentication with Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/auth-oidc)
+- [Keycloak - Securing Applications and Services Guide](https://www.keycloak.org/docs/latest/securing_apps/)
From ec9725026dca7b89d6a6464ea9b5c547debf42e5 Mon Sep 17 00:00:00 2001
From: Harshal Sheth
Date: Thu, 2 Nov 2023 09:39:08 -0700
Subject: [PATCH 18/34] chore(ingest): remove legacy memory_leak_detector
(#9158)
---
.../src/datahub/cli/ingest_cli.py | 4 -
metadata-ingestion/src/datahub/entrypoints.py | 15 ---
.../ingestion/source/looker/looker_config.py | 6 +-
.../datahub/utilities/memory_leak_detector.py | 106 ------------------
.../tests/integration/snowflake/common.py | 3 +-
.../tests/unit/test_snowflake_source.py | 15 +--
6 files changed, 10 insertions(+), 139 deletions(-)
delete mode 100644 metadata-ingestion/src/datahub/utilities/memory_leak_detector.py
diff --git a/metadata-ingestion/src/datahub/cli/ingest_cli.py b/metadata-ingestion/src/datahub/cli/ingest_cli.py
index 9b5716408f3e4..dd0287004a368 100644
--- a/metadata-ingestion/src/datahub/cli/ingest_cli.py
+++ b/metadata-ingestion/src/datahub/cli/ingest_cli.py
@@ -27,7 +27,6 @@
from datahub.ingestion.run.pipeline import Pipeline
from datahub.telemetry import telemetry
from datahub.upgrade import upgrade
-from datahub.utilities import memory_leak_detector
logger = logging.getLogger(__name__)
@@ -98,7 +97,6 @@ def ingest() -> None:
@click.option(
"--no-spinner", type=bool, is_flag=True, default=False, help="Turn off spinner"
)
-@click.pass_context
@telemetry.with_telemetry(
capture_kwargs=[
"dry_run",
@@ -109,9 +107,7 @@ def ingest() -> None:
"no_spinner",
]
)
-@memory_leak_detector.with_leak_detection
def run(
- ctx: click.Context,
config: str,
dry_run: bool,
preview: bool,
diff --git a/metadata-ingestion/src/datahub/entrypoints.py b/metadata-ingestion/src/datahub/entrypoints.py
index 5bfab3b841fa3..0cd37cc939854 100644
--- a/metadata-ingestion/src/datahub/entrypoints.py
+++ b/metadata-ingestion/src/datahub/entrypoints.py
@@ -70,21 +70,10 @@
version=datahub_package.nice_version_name(),
prog_name=datahub_package.__package_name__,
)
-@click.option(
- "-dl",
- "--detect-memory-leaks",
- type=bool,
- is_flag=True,
- default=False,
- help="Run memory leak detection.",
-)
-@click.pass_context
def datahub(
- ctx: click.Context,
debug: bool,
log_file: Optional[str],
debug_vars: bool,
- detect_memory_leaks: bool,
) -> None:
if debug_vars:
# debug_vars implies debug. This option isn't actually used here, but instead
@@ -109,10 +98,6 @@ def datahub(
_logging_configured = configure_logging(debug=debug, log_file=log_file)
_logging_configured.__enter__()
- # Setup the context for the memory_leak_detector decorator.
- ctx.ensure_object(dict)
- ctx.obj["detect_memory_leaks"] = detect_memory_leaks
-
@datahub.command()
@telemetry.with_telemetry()
diff --git a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py
index 96c405f7257d0..98d58c9fc9d87 100644
--- a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py
+++ b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py
@@ -121,7 +121,10 @@ class LookerCommonConfig(DatasetSourceConfigMixin):
"discoverable. When disabled, adds this information to the description of the column.",
)
platform_name: str = Field(
- "looker", description="Default platform name. Don't change."
+ # TODO: This shouldn't be part of the config.
+ "looker",
+ description="Default platform name.",
+ hidden_from_docs=True,
)
extract_column_level_lineage: bool = Field(
True,
@@ -213,7 +216,6 @@ def external_url_defaults_to_api_config_base_url(
def stateful_ingestion_should_be_enabled(
cls, v: Optional[bool], *, values: Dict[str, Any], **kwargs: Dict[str, Any]
) -> Optional[bool]:
-
stateful_ingestion: StatefulStaleMetadataRemovalConfig = cast(
StatefulStaleMetadataRemovalConfig, values.get("stateful_ingestion")
)
diff --git a/metadata-ingestion/src/datahub/utilities/memory_leak_detector.py b/metadata-ingestion/src/datahub/utilities/memory_leak_detector.py
deleted file mode 100644
index 85ad0fb4938eb..0000000000000
--- a/metadata-ingestion/src/datahub/utilities/memory_leak_detector.py
+++ /dev/null
@@ -1,106 +0,0 @@
-import fnmatch
-import gc
-import logging
-import sys
-import tracemalloc
-from collections import defaultdict
-from functools import wraps
-from typing import Any, Callable, Dict, List, TypeVar, Union, cast
-
-import click
-from typing_extensions import Concatenate, ParamSpec
-
-logger = logging.getLogger(__name__)
-T = TypeVar("T")
-P = ParamSpec("P")
-
-
-def _trace_has_file(trace: tracemalloc.Traceback, file_pattern: str) -> bool:
- for frame_index in range(len(trace)):
- cur_frame = trace[frame_index]
- if fnmatch.fnmatch(cur_frame.filename, file_pattern):
- return True
- return False
-
-
-def _init_leak_detection() -> None:
- # Initialize trace malloc to track up to 25 stack frames.
- tracemalloc.start(25)
- if sys.version_info >= (3, 9):
- # Nice to reset peak to 0. Available for versions >= 3.9.
- tracemalloc.reset_peak()
- # Enable leak debugging in the garbage collector.
- gc.set_debug(gc.DEBUG_LEAK)
-
-
-def _perform_leak_detection() -> None:
- # Log potentially useful memory usage metrics
- logger.info(f"GC count before collect {gc.get_count()}")
- traced_memory_size, traced_memory_peak = tracemalloc.get_traced_memory()
- logger.info(f"Traced Memory: size={traced_memory_size}, peak={traced_memory_peak}")
- num_unreacheable_objects = gc.collect()
- logger.info(f"Number of unreachable objects = {num_unreacheable_objects}")
- logger.info(f"GC count after collect {gc.get_count()}")
-
- # Collect unique traces of all live objects in the garbage - these have potential leaks.
- unique_traces_to_objects: Dict[
- Union[tracemalloc.Traceback, int], List[object]
- ] = defaultdict(list)
- for obj in gc.garbage:
- obj_trace = tracemalloc.get_object_traceback(obj)
- if obj_trace is not None:
- if _trace_has_file(obj_trace, "*datahub/*.py"):
- # Leaking object
- unique_traces_to_objects[obj_trace].append(obj)
- else:
- unique_traces_to_objects[id(obj)].append(obj)
- logger.info("Potentially leaking objects start")
- for key, obj_list in sorted(
- unique_traces_to_objects.items(),
- key=lambda item: sum(
- [sys.getsizeof(o) for o in item[1]]
- ), # TODO: add support for deep sizeof
- reverse=True,
- ):
- if isinstance(key, tracemalloc.Traceback):
- obj_traceback: tracemalloc.Traceback = cast(tracemalloc.Traceback, key)
- logger.info(
- f"#Objects:{len(obj_list)}; Total memory:{sum([sys.getsizeof(obj) for obj in obj_list])};"
- + " Allocation Trace:\n\t"
- + "\n\t".join(obj_traceback.format(limit=25))
- )
- else:
- logger.info(
- f"#Objects:{len(obj_list)}; Total memory:{sum([sys.getsizeof(obj) for obj in obj_list])};"
- + " No Allocation Trace available!"
- )
- logger.info("Potentially leaking objects end")
-
- tracemalloc.stop()
-
-
-def with_leak_detection(
- func: Callable[Concatenate[click.Context, P], T]
-) -> Callable[Concatenate[click.Context, P], T]:
- @wraps(func)
- def wrapper(ctx: click.Context, *args: P.args, **kwargs: P.kwargs) -> Any:
- detect_leaks: bool = ctx.obj.get("detect_memory_leaks", False)
- if detect_leaks:
- logger.info(
- f"Initializing memory leak detection on command: {func.__module__}.{func.__name__}"
- )
- _init_leak_detection()
-
- try:
- return func(ctx, *args, **kwargs)
- finally:
- if detect_leaks:
- logger.info(
- f"Starting memory leak detection on command: {func.__module__}.{func.__name__}"
- )
- _perform_leak_detection()
- logger.info(
- f"Finished memory leak detection on command: {func.__module__}.{func.__name__}"
- )
-
- return wrapper
diff --git a/metadata-ingestion/tests/integration/snowflake/common.py b/metadata-ingestion/tests/integration/snowflake/common.py
index ff448eca01071..78e5499697311 100644
--- a/metadata-ingestion/tests/integration/snowflake/common.py
+++ b/metadata-ingestion/tests/integration/snowflake/common.py
@@ -565,5 +565,4 @@ def default_query_results( # noqa: C901
"DOMAIN": "DATABASE",
},
]
- # Unreachable code
- raise Exception(f"Unknown query {query}")
+ raise ValueError(f"Unexpected query: {query}")
diff --git a/metadata-ingestion/tests/unit/test_snowflake_source.py b/metadata-ingestion/tests/unit/test_snowflake_source.py
index 888a7c0441554..aaff878b81eee 100644
--- a/metadata-ingestion/tests/unit/test_snowflake_source.py
+++ b/metadata-ingestion/tests/unit/test_snowflake_source.py
@@ -368,8 +368,7 @@ def default_query_results(query):
return [('{"roles":"","value":""}',)]
elif query == "select current_warehouse()":
return [("TEST_WAREHOUSE")]
- # Unreachable code
- raise Exception()
+ raise ValueError(f"Unexpected query: {query}")
connection_mock = MagicMock()
cursor_mock = MagicMock()
@@ -397,8 +396,7 @@ def query_results(query):
]
elif query == 'show grants to role "PUBLIC"':
return []
- # Unreachable code
- raise Exception()
+ raise ValueError(f"Unexpected query: {query}")
config = {
"username": "user",
@@ -441,8 +439,7 @@ def query_results(query):
return [("", "USAGE", "DATABASE", "DB1")]
elif query == 'show grants to role "PUBLIC"':
return []
- # Unreachable code
- raise Exception()
+ raise ValueError(f"Unexpected query: {query}")
setup_mock_connect(mock_connect, query_results)
@@ -485,8 +482,7 @@ def query_results(query):
]
elif query == 'show grants to role "PUBLIC"':
return []
- # Unreachable code
- raise Exception()
+ raise ValueError(f"Unexpected query: {query}")
setup_mock_connect(mock_connect, query_results)
@@ -536,8 +532,7 @@ def query_results(query):
["", "USAGE", "VIEW", "SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY"],
["", "USAGE", "VIEW", "SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES"],
]
- # Unreachable code
- raise Exception()
+ raise ValueError(f"Unexpected query: {query}")
setup_mock_connect(mock_connect, query_results)
From 148ad1ad9f00d6eb43d6acb270b9a90a745c8af3 Mon Sep 17 00:00:00 2001
From: Harshal Sheth
Date: Thu, 2 Nov 2023 09:44:35 -0700
Subject: [PATCH 19/34] feat(ingest/looker): support emitting unused explores
(#9159)
---
.../ingestion/source/looker/looker_common.py | 2 +-
.../ingestion/source/looker/looker_config.py | 4 ++
.../source/looker/looker_lib_wrapper.py | 7 +++
.../ingestion/source/looker/looker_source.py | 46 +++++++++++++------
4 files changed, 45 insertions(+), 14 deletions(-)
diff --git a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
index 30c38720dd96c..7ca5ce49019ab 100644
--- a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
+++ b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
@@ -388,7 +388,7 @@ def _get_field_type(
# if still not found, log and continue
if type_class is None:
- logger.info(
+ logger.debug(
f"The type '{native_type}' is not recognized for field type, setting as NullTypeClass.",
)
type_class = NullTypeClass
diff --git a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py
index 98d58c9fc9d87..e6ddea9a30489 100644
--- a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py
+++ b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_config.py
@@ -205,6 +205,10 @@ class LookerDashboardSourceConfig(
False,
description="Extract looks which are not part of any Dashboard. To enable this flag the stateful_ingestion should also be enabled.",
)
+ emit_used_explores_only: bool = Field(
+ True,
+ description="When enabled, only explores that are used by a Dashboard/Look will be ingested.",
+ )
@validator("external_base_url", pre=True, always=True)
def external_url_defaults_to_api_config_base_url(
diff --git a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_lib_wrapper.py b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_lib_wrapper.py
index b00f74b71e792..988caba1c0d74 100644
--- a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_lib_wrapper.py
+++ b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_lib_wrapper.py
@@ -59,6 +59,7 @@ class LookerAPIStats(BaseModel):
lookml_model_calls: int = 0
all_dashboards_calls: int = 0
all_looks_calls: int = 0
+ all_models_calls: int = 0
get_query_calls: int = 0
search_looks_calls: int = 0
search_dashboards_calls: int = 0
@@ -155,6 +156,12 @@ def dashboard(self, dashboard_id: str, fields: Union[str, List[str]]) -> Dashboa
transport_options=self.transport_options,
)
+ def all_lookml_models(self) -> Sequence[LookmlModel]:
+ self.client_stats.all_models_calls += 1
+ return self.client.all_lookml_models(
+ transport_options=self.transport_options,
+ )
+
def lookml_model_explore(self, model: str, explore_name: str) -> LookmlModelExplore:
self.client_stats.explore_calls += 1
return self.client.lookml_model_explore(
diff --git a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
index 09683d790c14c..4a98e8874bca0 100644
--- a/metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
+++ b/metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
@@ -147,9 +147,12 @@ def __init__(self, config: LookerDashboardSourceConfig, ctx: PipelineContext):
)
self.reporter._looker_explore_registry = self.explore_registry
self.reporter._looker_api = self.looker_api
+
self.reachable_look_registry = set()
- self.explores_to_fetch_set: Dict[Tuple[str, str], List[str]] = {}
+ # (model, explore) -> list of charts/looks/dashboards that reference this explore
+ # The list values are used purely for debugging purposes.
+ self.reachable_explores: Dict[Tuple[str, str], List[str]] = {}
# Keep stat generators to generate entity stat aspect later
stat_generator_config: looker_usage.StatGeneratorConfig = (
@@ -378,11 +381,11 @@ def _get_input_fields_from_query(
return result
- def add_explore_to_fetch(self, model: str, explore: str, via: str) -> None:
- if (model, explore) not in self.explores_to_fetch_set:
- self.explores_to_fetch_set[(model, explore)] = []
+ def add_reachable_explore(self, model: str, explore: str, via: str) -> None:
+ if (model, explore) not in self.reachable_explores:
+ self.reachable_explores[(model, explore)] = []
- self.explores_to_fetch_set[(model, explore)].append(via)
+ self.reachable_explores[(model, explore)].append(via)
def _get_looker_dashboard_element( # noqa: C901
self, element: DashboardElement
@@ -403,7 +406,7 @@ def _get_looker_dashboard_element( # noqa: C901
f"Element {element.title}: Explores added via query: {explores}"
)
for exp in explores:
- self.add_explore_to_fetch(
+ self.add_reachable_explore(
model=element.query.model,
explore=exp,
via=f"look:{element.look_id}:query:{element.dashboard_id}",
@@ -439,7 +442,7 @@ def _get_looker_dashboard_element( # noqa: C901
explores = [element.look.query.view]
logger.debug(f"Element {title}: Explores added via look: {explores}")
for exp in explores:
- self.add_explore_to_fetch(
+ self.add_reachable_explore(
model=element.look.query.model,
explore=exp,
via=f"Look:{element.look_id}:query:{element.dashboard_id}",
@@ -483,7 +486,7 @@ def _get_looker_dashboard_element( # noqa: C901
)
for exp in explores:
- self.add_explore_to_fetch(
+ self.add_reachable_explore(
model=element.result_maker.query.model,
explore=exp,
via=f"Look:{element.look_id}:resultmaker:query",
@@ -495,7 +498,7 @@ def _get_looker_dashboard_element( # noqa: C901
if filterable.view is not None and filterable.model is not None:
model = filterable.model
explores.append(filterable.view)
- self.add_explore_to_fetch(
+ self.add_reachable_explore(
model=filterable.model,
explore=filterable.view,
via=f"Look:{element.look_id}:resultmaker:filterable",
@@ -694,20 +697,26 @@ def _make_dashboard_metadata_events(
def _make_explore_metadata_events(
self,
) -> Iterable[Union[MetadataChangeEvent, MetadataChangeProposalWrapper]]:
+ if self.source_config.emit_used_explores_only:
+ explores_to_fetch = list(self.reachable_explores.keys())
+ else:
+ explores_to_fetch = list(self.list_all_explores())
+ explores_to_fetch.sort()
+
with concurrent.futures.ThreadPoolExecutor(
max_workers=self.source_config.max_threads
) as async_executor:
- self.reporter.total_explores = len(self.explores_to_fetch_set)
+ self.reporter.total_explores = len(explores_to_fetch)
explore_futures = {
async_executor.submit(self.fetch_one_explore, model, explore): (
model,
explore,
)
- for (model, explore) in self.explores_to_fetch_set
+ for (model, explore) in explores_to_fetch
}
- for future in concurrent.futures.as_completed(explore_futures):
+ for future in concurrent.futures.wait(explore_futures).done:
events, explore_id, start_time, end_time = future.result()
del explore_futures[future]
self.reporter.explores_scanned += 1
@@ -717,6 +726,17 @@ def _make_explore_metadata_events(
f"Running time of fetch_one_explore for {explore_id}: {(end_time - start_time).total_seconds()}"
)
+ def list_all_explores(self) -> Iterable[Tuple[str, str]]:
+ # returns a list of (model, explore) tuples
+
+ for model in self.looker_api.all_lookml_models():
+ if model.name is None or model.explores is None:
+ continue
+ for explore in model.explores:
+ if explore.name is None:
+ continue
+ yield (model.name, explore.name)
+
def fetch_one_explore(
self, model: str, explore: str
) -> Tuple[
@@ -954,7 +974,7 @@ def _input_fields_from_dashboard_element(
)
if explore is not None:
# add this to the list of explores to finally generate metadata for
- self.add_explore_to_fetch(
+ self.add_reachable_explore(
input_field.model, input_field.explore, entity_urn
)
entity_urn = explore.get_explore_urn(self.source_config)
From 7ff48b37aaea165ba3c3cb6f9f9f742ea2e37654 Mon Sep 17 00:00:00 2001
From: david-leifker <114954101+david-leifker@users.noreply.github.com>
Date: Fri, 3 Nov 2023 10:23:37 -0500
Subject: [PATCH 20/34] refactor(policy): refactor policy locking, no
functional difference (#9163)
---
.../authorization/DataHubAuthorizer.java | 111 +++++++++---------
1 file changed, 55 insertions(+), 56 deletions(-)
diff --git a/metadata-service/auth-impl/src/main/java/com/datahub/authorization/DataHubAuthorizer.java b/metadata-service/auth-impl/src/main/java/com/datahub/authorization/DataHubAuthorizer.java
index e30fb93109915..f8b28f6c182a7 100644
--- a/metadata-service/auth-impl/src/main/java/com/datahub/authorization/DataHubAuthorizer.java
+++ b/metadata-service/auth-impl/src/main/java/com/datahub/authorization/DataHubAuthorizer.java
@@ -19,6 +19,7 @@
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
+import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
import javax.annotation.Nonnull;
@@ -55,7 +56,8 @@ public enum AuthorizationMode {
// Maps privilege name to the associated set of policies for fast access.
// Not concurrent data structure because writes are always against the entire thing.
private final Map> _policyCache = new HashMap<>(); // Shared Policy Cache.
- private final ReadWriteLock _lockPolicyCache = new ReentrantReadWriteLock();
+ private final ReadWriteLock readWriteLock = new ReentrantReadWriteLock();
+ private final Lock readLock = readWriteLock.readLock();
private final ScheduledExecutorService _refreshExecutorService = Executors.newScheduledThreadPool(1);
private final PolicyRefreshRunnable _policyRefreshRunnable;
@@ -74,7 +76,7 @@ public DataHubAuthorizer(
_systemAuthentication = Objects.requireNonNull(systemAuthentication);
_mode = Objects.requireNonNull(mode);
_policyEngine = new PolicyEngine(systemAuthentication, Objects.requireNonNull(entityClient));
- _policyRefreshRunnable = new PolicyRefreshRunnable(systemAuthentication, new PolicyFetcher(entityClient), _policyCache, _lockPolicyCache);
+ _policyRefreshRunnable = new PolicyRefreshRunnable(systemAuthentication, new PolicyFetcher(entityClient), _policyCache, readWriteLock.writeLock());
_refreshExecutorService.scheduleAtFixedRate(_policyRefreshRunnable, delayIntervalSeconds, refreshIntervalSeconds, TimeUnit.SECONDS);
}
@@ -93,41 +95,30 @@ public AuthorizationResult authorize(@Nonnull final AuthorizationRequest request
Optional resolvedResourceSpec = request.getResourceSpec().map(_entitySpecResolver::resolve);
- _lockPolicyCache.readLock().lock();
- try {
- // 1. Fetch the policies relevant to the requested privilege.
- final List policiesToEvaluate = _policyCache.getOrDefault(request.getPrivilege(), new ArrayList<>());
-
- // 2. Evaluate each policy.
- for (DataHubPolicyInfo policy : policiesToEvaluate) {
- if (isRequestGranted(policy, request, resolvedResourceSpec)) {
- // Short circuit if policy has granted privileges to this actor.
- return new AuthorizationResult(request, AuthorizationResult.Type.ALLOW,
- String.format("Granted by policy with type: %s", policy.getType()));
- }
+ // 1. Fetch the policies relevant to the requested privilege.
+ final List policiesToEvaluate = getOrDefault(request.getPrivilege(), new ArrayList<>());
+
+ // 2. Evaluate each policy.
+ for (DataHubPolicyInfo policy : policiesToEvaluate) {
+ if (isRequestGranted(policy, request, resolvedResourceSpec)) {
+ // Short circuit if policy has granted privileges to this actor.
+ return new AuthorizationResult(request, AuthorizationResult.Type.ALLOW,
+ String.format("Granted by policy with type: %s", policy.getType()));
}
- return new AuthorizationResult(request, AuthorizationResult.Type.DENY, null);
- } finally {
- _lockPolicyCache.readLock().unlock();
}
+ return new AuthorizationResult(request, AuthorizationResult.Type.DENY, null);
}
public List getGrantedPrivileges(final String actor, final Optional resourceSpec) {
+ // 1. Fetch all policies
+ final List policiesToEvaluate = getOrDefault(ALL, new ArrayList<>());
- _lockPolicyCache.readLock().lock();
- try {
- // 1. Fetch all policies
- final List policiesToEvaluate = _policyCache.getOrDefault(ALL, new ArrayList<>());
-
- Urn actorUrn = UrnUtils.getUrn(actor);
- final ResolvedEntitySpec resolvedActorSpec = _entitySpecResolver.resolve(new EntitySpec(actorUrn.getEntityType(), actor));
+ Urn actorUrn = UrnUtils.getUrn(actor);
+ final ResolvedEntitySpec resolvedActorSpec = _entitySpecResolver.resolve(new EntitySpec(actorUrn.getEntityType(), actor));
- Optional resolvedResourceSpec = resourceSpec.map(_entitySpecResolver::resolve);
+ Optional resolvedResourceSpec = resourceSpec.map(_entitySpecResolver::resolve);
- return _policyEngine.getGrantedPrivileges(policiesToEvaluate, resolvedActorSpec, resolvedResourceSpec);
- } finally {
- _lockPolicyCache.readLock().unlock();
- }
+ return _policyEngine.getGrantedPrivileges(policiesToEvaluate, resolvedActorSpec, resolvedResourceSpec);
}
/**
@@ -143,36 +134,31 @@ public AuthorizedActors authorizedActors(
boolean allUsers = false;
boolean allGroups = false;
- _lockPolicyCache.readLock().lock();
- try {
- // Step 1: Find policies granting the privilege.
- final List policiesToEvaluate = _policyCache.getOrDefault(privilege, new ArrayList<>());
-
- Optional resolvedResourceSpec = resourceSpec.map(_entitySpecResolver::resolve);
+ // Step 1: Find policies granting the privilege.
+ final List policiesToEvaluate = getOrDefault(privilege, new ArrayList<>());
+ Optional resolvedResourceSpec = resourceSpec.map(_entitySpecResolver::resolve);
- // Step 2: For each policy, determine whether the resource is a match.
- for (DataHubPolicyInfo policy : policiesToEvaluate) {
- if (!PoliciesConfig.ACTIVE_POLICY_STATE.equals(policy.getState())) {
- // Policy is not active, skip.
- continue;
- }
+ // Step 2: For each policy, determine whether the resource is a match.
+ for (DataHubPolicyInfo policy : policiesToEvaluate) {
+ if (!PoliciesConfig.ACTIVE_POLICY_STATE.equals(policy.getState())) {
+ // Policy is not active, skip.
+ continue;
+ }
- final PolicyEngine.PolicyActors matchingActors = _policyEngine.getMatchingActors(policy, resolvedResourceSpec);
+ final PolicyEngine.PolicyActors matchingActors = _policyEngine.getMatchingActors(policy, resolvedResourceSpec);
- // Step 3: For each matching policy, add actors that are authorized.
- authorizedUsers.addAll(matchingActors.getUsers());
- authorizedGroups.addAll(matchingActors.getGroups());
- if (matchingActors.allUsers()) {
- allUsers = true;
- }
- if (matchingActors.allGroups()) {
- allGroups = true;
- }
+ // Step 3: For each matching policy, add actors that are authorized.
+ authorizedUsers.addAll(matchingActors.getUsers());
+ authorizedGroups.addAll(matchingActors.getGroups());
+ if (matchingActors.allUsers()) {
+ allUsers = true;
+ }
+ if (matchingActors.allGroups()) {
+ allGroups = true;
}
- } finally {
- _lockPolicyCache.readLock().unlock();
}
+
// Step 4: Return all authorized users and groups.
return new AuthorizedActors(privilege, authorizedUsers, authorizedGroups, allUsers, allGroups);
}
@@ -234,6 +220,16 @@ private Optional getUrnFromRequestActor(String actor) {
}
}
+ private List getOrDefault(String key, List defaultValue) {
+ readLock.lock();
+ try {
+ return _policyCache.getOrDefault(key, defaultValue);
+ } finally {
+ // To unlock the acquired read thread
+ readLock.unlock();
+ }
+ }
+
/**
* A {@link Runnable} used to periodically fetch a new instance of the policies Cache.
*
@@ -247,7 +243,7 @@ static class PolicyRefreshRunnable implements Runnable {
private final Authentication _systemAuthentication;
private final PolicyFetcher _policyFetcher;
private final Map> _policyCache;
- private final ReadWriteLock _lockPolicyCache;
+ private final Lock writeLock;
@Override
public void run() {
@@ -274,13 +270,16 @@ public void run() {
return;
}
}
- _lockPolicyCache.writeLock().lock();
+
+ writeLock.lock();
try {
_policyCache.clear();
_policyCache.putAll(newCache);
} finally {
- _lockPolicyCache.writeLock().unlock();
+ // To unlock the acquired write thread
+ writeLock.unlock();
}
+
log.debug(String.format("Successfully fetched %s policies.", total));
} catch (Exception e) {
log.error("Caught exception while loading Policy cache. Will retry on next scheduled attempt.", e);
From 07311115c5ca436f64fad9c685cfc586cc5d4180 Mon Sep 17 00:00:00 2001
From: Kos Korchak <97058061+kkorchak@users.noreply.github.com>
Date: Fri, 3 Nov 2023 13:00:15 -0400
Subject: [PATCH 21/34] API test for managing access token privilege (#9167)
---
.../tests/privileges/test_privileges.py | 155 ++++++++++++++----
1 file changed, 127 insertions(+), 28 deletions(-)
diff --git a/smoke-test/tests/privileges/test_privileges.py b/smoke-test/tests/privileges/test_privileges.py
index 13d6b6cf3415a..740311754678e 100644
--- a/smoke-test/tests/privileges/test_privileges.py
+++ b/smoke-test/tests/privileges/test_privileges.py
@@ -52,6 +52,20 @@ def privileges_and_test_user_setup(admin_session):
wait_for_writes_to_sync()
+@tenacity.retry(
+ stop=tenacity.stop_after_attempt(sleep_times), wait=tenacity.wait_fixed(sleep_sec)
+)
+def _ensure_cant_perform_action(session, json,assertion_key):
+ action_response = session.post(
+ f"{get_frontend_url()}/api/v2/graphql", json=json)
+ action_response.raise_for_status()
+ action_data = action_response.json()
+
+ assert action_data["errors"][0]["extensions"]["code"] == 403
+ assert action_data["errors"][0]["extensions"]["type"] == "UNAUTHORIZED"
+ assert action_data["data"][assertion_key] == None
+
+
@tenacity.retry(
stop=tenacity.stop_after_attempt(10), wait=tenacity.wait_fixed(sleep_sec)
)
@@ -67,20 +81,6 @@ def _ensure_can_create_secret(session, json, urn):
assert secret_data["data"]["createSecret"] == urn
-@tenacity.retry(
- stop=tenacity.stop_after_attempt(sleep_times), wait=tenacity.wait_fixed(sleep_sec)
-)
-def _ensure_cant_create_secret(session, json):
- create_secret_response = session.post(
- f"{get_frontend_url()}/api/v2/graphql", json=json)
- create_secret_response.raise_for_status()
- create_secret_data = create_secret_response.json()
-
- assert create_secret_data["errors"][0]["extensions"]["code"] == 403
- assert create_secret_data["errors"][0]["extensions"]["type"] == "UNAUTHORIZED"
- assert create_secret_data["data"]["createSecret"] == None
-
-
@tenacity.retry(
stop=tenacity.stop_after_attempt(10), wait=tenacity.wait_fixed(sleep_sec)
)
@@ -99,17 +99,19 @@ def _ensure_can_create_ingestion_source(session, json):
@tenacity.retry(
- stop=tenacity.stop_after_attempt(sleep_times), wait=tenacity.wait_fixed(sleep_sec)
+ stop=tenacity.stop_after_attempt(10), wait=tenacity.wait_fixed(sleep_sec)
)
-def _ensure_cant_create_ingestion_source(session, json):
- create_source_response = session.post(
+def _ensure_can_create_access_token(session, json):
+ create_access_token_success = session.post(
f"{get_frontend_url()}/api/v2/graphql", json=json)
- create_source_response.raise_for_status()
- create_source_data = create_source_response.json()
+ create_access_token_success.raise_for_status()
+ ingestion_data = create_access_token_success.json()
- assert create_source_data["errors"][0]["extensions"]["code"] == 403
- assert create_source_data["errors"][0]["extensions"]["type"] == "UNAUTHORIZED"
- assert create_source_data["data"]["createIngestionSource"] == None
+ assert ingestion_data
+ assert ingestion_data["data"]
+ assert ingestion_data["data"]["createAccessToken"]
+ assert ingestion_data["data"]["createAccessToken"]["accessToken"] is not None
+ assert ingestion_data["data"]["createAccessToken"]["__typename"] == "AccessToken"
@pytest.mark.dependency(depends=["test_healthchecks"])
@@ -132,7 +134,7 @@ def test_privilege_to_create_and_manage_secrets():
}
},
}
- _ensure_cant_create_secret(user_session, create_secret)
+ _ensure_cant_perform_action(user_session, create_secret,"createSecret")
# Assign privileges to the new user to manage secrets
@@ -166,7 +168,7 @@ def test_privilege_to_create_and_manage_secrets():
remove_policy(policy_urn, admin_session)
# Ensure user can't create secret after policy is removed
- _ensure_cant_create_secret(user_session, create_secret)
+ _ensure_cant_perform_action(user_session, create_secret,"createSecret")
@pytest.mark.dependency(depends=["test_healthchecks"])
@@ -182,11 +184,18 @@ def test_privilege_to_create_and_manage_ingestion_source():
createIngestionSource(input: $input)\n}""",
"variables": {"input":{"type":"snowflake","name":"test","config":
{"recipe":
- "{\"source\":{\"type\":\"snowflake\",\"config\":{\"account_id\":null,\"include_table_lineage\":true,\"include_view_lineage\":true,\"include_tables\":true,\"include_views\":true,\"profiling\":{\"enabled\":true,\"profile_table_level_only\":true},\"stateful_ingestion\":{\"enabled\":true}}}}",
+ """{\"source\":{\"type\":\"snowflake\",\"config\":{
+ \"account_id\":null,
+ \"include_table_lineage\":true,
+ \"include_view_lineage\":true,
+ \"include_tables\":true,
+ \"include_views\":true,
+ \"profiling\":{\"enabled\":true,\"profile_table_level_only\":true},
+ \"stateful_ingestion\":{\"enabled\":true}}}}""",
"executorId":"default","debugMode":False,"extraArgs":[]}}},
}
- _ensure_cant_create_ingestion_source(user_session, create_ingestion_source)
+ _ensure_cant_perform_action(user_session, create_ingestion_source, "createIngestionSource")
# Assign privileges to the new user to manage ingestion source
@@ -201,7 +210,14 @@ def test_privilege_to_create_and_manage_ingestion_source():
updateIngestionSource(urn: $urn, input: $input)\n}""",
"variables": {"urn":ingestion_source_urn,
"input":{"type":"snowflake","name":"test updated",
- "config":{"recipe":"{\"source\":{\"type\":\"snowflake\",\"config\":{\"account_id\":null,\"include_table_lineage\":true,\"include_view_lineage\":true,\"include_tables\":true,\"include_views\":true,\"profiling\":{\"enabled\":true,\"profile_table_level_only\":true},\"stateful_ingestion\":{\"enabled\":true}}}}",
+ "config":{"recipe":"""{\"source\":{\"type\":\"snowflake\",\"config\":{
+ \"account_id\":null,
+ \"include_table_lineage\":true,
+ \"include_view_lineage\":true,
+ \"include_tables\":true,
+ \"include_views\":true,
+ \"profiling\":{\"enabled\":true,\"profile_table_level_only\":true},
+ \"stateful_ingestion\":{\"enabled\":true}}}}""",
"executorId":"default","debugMode":False,"extraArgs":[]}}}
}
@@ -238,4 +254,87 @@ def test_privilege_to_create_and_manage_ingestion_source():
remove_policy(policy_urn, admin_session)
# Ensure that user can't create ingestion source after policy is removed
- _ensure_cant_create_ingestion_source(user_session, create_ingestion_source)
\ No newline at end of file
+ _ensure_cant_perform_action(user_session, create_ingestion_source, "createIngestionSource")
+
+
+@pytest.mark.dependency(depends=["test_healthchecks"])
+def test_privilege_to_create_and_manage_access_tokens():
+
+ (admin_user, admin_pass) = get_admin_credentials()
+ admin_session = login_as(admin_user, admin_pass)
+ user_session = login_as("user", "user")
+
+
+ # Verify new user can't create access token
+ create_access_token = {
+ "query": """mutation createAccessToken($input: CreateAccessTokenInput!) {\n
+ createAccessToken(input: $input) {\n accessToken\n __typename\n }\n}\n""",
+ "variables": {"input":{"actorUrn":"urn:li:corpuser:user",
+ "type":"PERSONAL",
+ "duration":"ONE_MONTH",
+ "name":"test",
+ "description":"test"}}
+ }
+
+ _ensure_cant_perform_action(user_session, create_access_token,"createAccessToken")
+
+
+ # Assign privileges to the new user to create and manage access tokens
+ policy_urn = create_user_policy("urn:li:corpuser:user", ["MANAGE_ACCESS_TOKENS"], admin_session)
+
+
+ # Verify new user can create and manage access token(create, revoke)
+ # Create a access token
+ _ensure_can_create_access_token(user_session, create_access_token)
+
+
+ # List access tokens first to get token id
+ list_access_tokens = {
+ "query": """query listAccessTokens($input: ListAccessTokenInput!) {\n
+ listAccessTokens(input: $input) {\n
+ start\n count\n total\n tokens {\n urn\n type\n
+ id\n name\n description\n actorUrn\n ownerUrn\n
+ createdAt\n expiresAt\n __typename\n }\n __typename\n }\n}\n""",
+ "variables": {
+ "input":{
+ "start":0,"count":10,"filters":[{
+ "field":"ownerUrn",
+ "values":["urn:li:corpuser:user"]}]}
+ }
+ }
+
+ list_tokens_response = user_session.post(f"{get_frontend_url()}/api/v2/graphql", json=list_access_tokens)
+ list_tokens_response.raise_for_status()
+ list_tokens_data = list_tokens_response.json()
+
+ assert list_tokens_data
+ assert list_tokens_data["data"]
+ assert list_tokens_data["data"]["listAccessTokens"]["tokens"][0]["id"] is not None
+
+ access_token_id = list_tokens_data["data"]["listAccessTokens"]["tokens"][0]["id"]
+
+
+ # Revoke access token
+ revoke_access_token = {
+ "query": "mutation revokeAccessToken($tokenId: String!) {\n revokeAccessToken(tokenId: $tokenId)\n}\n",
+ "variables": {
+ "tokenId": access_token_id
+ },
+ }
+
+ revoke_token_response = user_session.post(f"{get_frontend_url()}/api/v2/graphql", json=revoke_access_token)
+ revoke_token_response.raise_for_status()
+ revoke_token_data = revoke_token_response.json()
+
+ assert revoke_token_data
+ assert revoke_token_data["data"]
+ assert revoke_token_data["data"]["revokeAccessToken"]
+ assert revoke_token_data["data"]["revokeAccessToken"] is True
+
+
+ # Remove the policy
+ remove_policy(policy_urn, admin_session)
+
+
+ # Ensure that user can't create access token after policy is removed
+ _ensure_cant_perform_action(user_session, create_access_token,"createAccessToken")
\ No newline at end of file
From ddb4e1b5ffa01763d7d3353a506d4329faf11e25 Mon Sep 17 00:00:00 2001
From: Davi Arnaut
Date: Fri, 3 Nov 2023 10:26:11 -0700
Subject: [PATCH 22/34] fix(mysql-setup): quote database name (#9169)
---
docker/mysql-setup/init.sql | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docker/mysql-setup/init.sql b/docker/mysql-setup/init.sql
index b789329ddfd17..b6a1d47fb2a02 100644
--- a/docker/mysql-setup/init.sql
+++ b/docker/mysql-setup/init.sql
@@ -1,6 +1,6 @@
-- create datahub database
-CREATE DATABASE IF NOT EXISTS DATAHUB_DB_NAME CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
-USE DATAHUB_DB_NAME;
+CREATE DATABASE IF NOT EXISTS `DATAHUB_DB_NAME` CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
+USE `DATAHUB_DB_NAME`;
-- create metadata aspect table
create table if not exists metadata_aspect_v2 (
From c2bc41d15eed31f89076913f641298ded5219a4f Mon Sep 17 00:00:00 2001
From: david-leifker <114954101+david-leifker@users.noreply.github.com>
Date: Fri, 3 Nov 2023 12:29:31 -0500
Subject: [PATCH 23/34] fix(health): fix health check url authentication
(#9117)
---
.../authentication/AuthenticationRequest.java | 12 ++++
.../filter/AuthenticationFilter.java | 13 ++++-
.../HealthStatusAuthenticator.java | 55 +++++++++++++++++++
.../src/main/resources/application.yml | 2 +
metadata-service/health-servlet/build.gradle | 22 --------
.../openapi/config/SpringWebConfig.java | 2 -
.../health}/HealthCheckController.java | 30 ++++++----
metadata-service/war/build.gradle | 1 -
.../webapp/WEB-INF/openapiServlet-servlet.xml | 2 +-
settings.gradle | 1 -
10 files changed, 101 insertions(+), 39 deletions(-)
create mode 100644 metadata-service/auth-impl/src/main/java/com/datahub/authentication/authenticator/HealthStatusAuthenticator.java
delete mode 100644 metadata-service/health-servlet/build.gradle
rename metadata-service/{health-servlet/src/main/java/com/datahub/health/controller => openapi-servlet/src/main/java/io/datahubproject/openapi/health}/HealthCheckController.java (79%)
diff --git a/metadata-auth/auth-api/src/main/java/com/datahub/authentication/AuthenticationRequest.java b/metadata-auth/auth-api/src/main/java/com/datahub/authentication/AuthenticationRequest.java
index 91f15f9d5ae61..5673bac5442b2 100644
--- a/metadata-auth/auth-api/src/main/java/com/datahub/authentication/AuthenticationRequest.java
+++ b/metadata-auth/auth-api/src/main/java/com/datahub/authentication/AuthenticationRequest.java
@@ -1,6 +1,8 @@
package com.datahub.authentication;
import com.datahub.plugins.auth.authentication.Authenticator;
+import lombok.Getter;
+
import java.util.Map;
import java.util.Objects;
import java.util.TreeMap;
@@ -13,14 +15,24 @@
* Currently, this class only hold the inbound request's headers, but could certainly be extended
* to contain additional information like the request parameters, body, ip, etc as needed.
*/
+@Getter
public class AuthenticationRequest {
private final Map caseInsensitiveHeaders;
+ private final String servletInfo;
+ private final String pathInfo;
+
public AuthenticationRequest(@Nonnull final Map requestHeaders) {
+ this("", "", requestHeaders);
+ }
+
+ public AuthenticationRequest(@Nonnull String servletInfo, @Nonnull String pathInfo, @Nonnull final Map requestHeaders) {
Objects.requireNonNull(requestHeaders);
caseInsensitiveHeaders = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
caseInsensitiveHeaders.putAll(requestHeaders);
+ this.servletInfo = servletInfo;
+ this.pathInfo = pathInfo;
}
/**
diff --git a/metadata-service/auth-filter/src/main/java/com/datahub/auth/authentication/filter/AuthenticationFilter.java b/metadata-service/auth-filter/src/main/java/com/datahub/auth/authentication/filter/AuthenticationFilter.java
index e15918a813158..8c7b3ac8b98f0 100644
--- a/metadata-service/auth-filter/src/main/java/com/datahub/auth/authentication/filter/AuthenticationFilter.java
+++ b/metadata-service/auth-filter/src/main/java/com/datahub/auth/authentication/filter/AuthenticationFilter.java
@@ -2,6 +2,7 @@
import com.datahub.authentication.authenticator.AuthenticatorChain;
import com.datahub.authentication.authenticator.DataHubSystemAuthenticator;
+import com.datahub.authentication.authenticator.HealthStatusAuthenticator;
import com.datahub.authentication.authenticator.NoOpAuthenticator;
import com.datahub.authentication.token.StatefulTokenService;
import com.datahub.plugins.PluginConstant;
@@ -29,6 +30,7 @@
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Collections;
+import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
@@ -148,7 +150,7 @@ private void buildAuthenticatorChain() {
}
private AuthenticationRequest buildAuthContext(HttpServletRequest request) {
- return new AuthenticationRequest(Collections.list(request.getHeaderNames())
+ return new AuthenticationRequest(request.getServletPath(), request.getPathInfo(), Collections.list(request.getHeaderNames())
.stream()
.collect(Collectors.toMap(headerName -> headerName, request::getHeader)));
}
@@ -242,7 +244,14 @@ private void registerNativeAuthenticator(AuthenticatorChain authenticatorChain,
final Authenticator authenticator = clazz.newInstance();
// Successfully created authenticator. Now init and register it.
log.debug(String.format("Initializing Authenticator with name %s", type));
- authenticator.init(configs, authenticatorContext);
+ if (authenticator instanceof HealthStatusAuthenticator) {
+ Map authenticatorConfig = new HashMap<>(Map.of(SYSTEM_CLIENT_ID_CONFIG,
+ this.configurationProvider.getAuthentication().getSystemClientId()));
+ authenticatorConfig.putAll(Optional.ofNullable(internalAuthenticatorConfig.getConfigs()).orElse(Collections.emptyMap()));
+ authenticator.init(authenticatorConfig, authenticatorContext);
+ } else {
+ authenticator.init(configs, authenticatorContext);
+ }
log.info(String.format("Registering Authenticator with name %s", type));
authenticatorChain.register(authenticator);
} catch (Exception e) {
diff --git a/metadata-service/auth-impl/src/main/java/com/datahub/authentication/authenticator/HealthStatusAuthenticator.java b/metadata-service/auth-impl/src/main/java/com/datahub/authentication/authenticator/HealthStatusAuthenticator.java
new file mode 100644
index 0000000000000..5749eacf5d25d
--- /dev/null
+++ b/metadata-service/auth-impl/src/main/java/com/datahub/authentication/authenticator/HealthStatusAuthenticator.java
@@ -0,0 +1,55 @@
+package com.datahub.authentication.authenticator;
+
+import com.datahub.authentication.Actor;
+import com.datahub.authentication.ActorType;
+import com.datahub.authentication.Authentication;
+import com.datahub.authentication.AuthenticationException;
+import com.datahub.authentication.AuthenticationRequest;
+import com.datahub.authentication.AuthenticatorContext;
+import com.datahub.plugins.auth.authentication.Authenticator;
+import lombok.extern.slf4j.Slf4j;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+
+import static com.datahub.authentication.AuthenticationConstants.SYSTEM_CLIENT_ID_CONFIG;
+
+
+/**
+ * This Authenticator is used for allowing access for unauthenticated health check endpoints
+ *
+ * It exists to support load balancers, liveness/readiness checks
+ *
+ */
+@Slf4j
+public class HealthStatusAuthenticator implements Authenticator {
+ private static final Set HEALTH_ENDPOINTS = Set.of(
+ "/openapi/check/",
+ "/openapi/up/"
+ );
+ private String systemClientId;
+
+ @Override
+ public void init(@Nonnull final Map config, @Nullable final AuthenticatorContext context) {
+ Objects.requireNonNull(config, "Config parameter cannot be null");
+ this.systemClientId = Objects.requireNonNull((String) config.get(SYSTEM_CLIENT_ID_CONFIG),
+ String.format("Missing required config %s", SYSTEM_CLIENT_ID_CONFIG));
+ }
+
+ @Override
+ public Authentication authenticate(@Nonnull AuthenticationRequest context) throws AuthenticationException {
+ Objects.requireNonNull(context);
+ if (HEALTH_ENDPOINTS.stream().anyMatch(prefix -> String.join("", context.getServletInfo(), context.getPathInfo()).startsWith(prefix))) {
+ return new Authentication(
+ new Actor(ActorType.USER, systemClientId),
+ "",
+ Collections.emptyMap()
+ );
+ }
+ throw new AuthenticationException("Authorization not allowed. Non-health check endpoint.");
+ }
+}
diff --git a/metadata-service/configuration/src/main/resources/application.yml b/metadata-service/configuration/src/main/resources/application.yml
index b817208672e08..91b10a75c922e 100644
--- a/metadata-service/configuration/src/main/resources/application.yml
+++ b/metadata-service/configuration/src/main/resources/application.yml
@@ -11,6 +11,8 @@ authentication:
# Key used to validate incoming tokens. Should typically be the same as authentication.tokenService.signingKey
signingKey: ${DATAHUB_TOKEN_SERVICE_SIGNING_KEY:WnEdIeTG/VVCLQqGwC/BAkqyY0k+H8NEAtWGejrBI94=}
salt: ${DATAHUB_TOKEN_SERVICE_SALT:ohDVbJBvHHVJh9S/UA4BYF9COuNnqqVhr9MLKEGXk1O=}
+ # Required for unauthenticated health check endpoints - best not to remove.
+ - type: com.datahub.authentication.authenticator.HealthStatusAuthenticator
# Normally failures are only warnings, enable this to throw them.
logAuthenticatorExceptions: ${METADATA_SERVICE_AUTHENTICATOR_EXCEPTIONS_ENABLED:false}
diff --git a/metadata-service/health-servlet/build.gradle b/metadata-service/health-servlet/build.gradle
deleted file mode 100644
index 6095f724b3cd4..0000000000000
--- a/metadata-service/health-servlet/build.gradle
+++ /dev/null
@@ -1,22 +0,0 @@
-apply plugin: 'java'
-
-dependencies {
-
- implementation project(':metadata-service:factories')
-
- implementation externalDependency.guava
- implementation externalDependency.reflections
- implementation externalDependency.springBoot
- implementation externalDependency.springCore
- implementation externalDependency.springDocUI
- implementation externalDependency.springWeb
- implementation externalDependency.springWebMVC
- implementation externalDependency.springBeans
- implementation externalDependency.springContext
- implementation externalDependency.slf4jApi
- compileOnly externalDependency.lombok
- implementation externalDependency.antlr4Runtime
- implementation externalDependency.antlr4
-
- annotationProcessor externalDependency.lombok
-}
\ No newline at end of file
diff --git a/metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/config/SpringWebConfig.java b/metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/config/SpringWebConfig.java
index 71e8c79a2275a..e4f49df90c392 100644
--- a/metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/config/SpringWebConfig.java
+++ b/metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/config/SpringWebConfig.java
@@ -44,7 +44,6 @@ public GroupedOpenApi defaultOpenApiGroup() {
.group("default")
.packagesToExclude(
"io.datahubproject.openapi.operations",
- "com.datahub.health",
"io.datahubproject.openapi.health"
).build();
}
@@ -55,7 +54,6 @@ public GroupedOpenApi operationsOpenApiGroup() {
.group("operations")
.packagesToScan(
"io.datahubproject.openapi.operations",
- "com.datahub.health",
"io.datahubproject.openapi.health"
).build();
}
diff --git a/metadata-service/health-servlet/src/main/java/com/datahub/health/controller/HealthCheckController.java b/metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/health/HealthCheckController.java
similarity index 79%
rename from metadata-service/health-servlet/src/main/java/com/datahub/health/controller/HealthCheckController.java
rename to metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/health/HealthCheckController.java
index c200e63e0d497..c90603bf88c31 100644
--- a/metadata-service/health-servlet/src/main/java/com/datahub/health/controller/HealthCheckController.java
+++ b/metadata-service/openapi-servlet/src/main/java/io/datahubproject/openapi/health/HealthCheckController.java
@@ -1,5 +1,6 @@
-package com.datahub.health.controller;
+package io.datahubproject.openapi.health;
+import com.google.common.base.Supplier;
import com.google.common.base.Suppliers;
import com.linkedin.gms.factory.config.ConfigurationProvider;
import io.swagger.v3.oas.annotations.tags.Tag;
@@ -9,7 +10,6 @@
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
-import java.util.function.Supplier;
import org.opensearch.action.admin.cluster.health.ClusterHealthRequest;
import org.opensearch.action.admin.cluster.health.ClusterHealthResponse;
@@ -27,7 +27,7 @@
@RestController
-@RequestMapping("/check")
+@RequestMapping("/")
@Tag(name = "HealthCheck", description = "An API for checking health of GMS and its clients.")
public class HealthCheckController {
@Autowired
@@ -41,6 +41,12 @@ public HealthCheckController(ConfigurationProvider config) {
this::getElasticHealth, config.getHealthCheck().getCacheDurationSeconds(), TimeUnit.SECONDS);
}
+ @GetMapping(path = "/check/ready", produces = MediaType.APPLICATION_JSON_VALUE)
+ public ResponseEntity getCombinedHealthCheck(String... checks) {
+ return ResponseEntity.status(getCombinedDebug(checks).getStatusCode())
+ .body(getCombinedDebug(checks).getStatusCode().is2xxSuccessful());
+ }
+
/**
* Combined health check endpoint for checking GMS clients.
* For now, just checks the health of the ElasticSearch client
@@ -48,11 +54,10 @@ public HealthCheckController(ConfigurationProvider config) {
* that component). The status code will be 200 if all components are okay, and 500 if one or more components are not
* healthy.
*/
- @GetMapping(path = "/ready", produces = MediaType.APPLICATION_JSON_VALUE)
- public ResponseEntity
-
:::note
Inline markdown or code snippets are not yet supported for field level documentation.
:::
-
### 2. Set up the reporter
The reporter interface enables the source to report statistics, warnings, failures, and other information about the run.
@@ -71,6 +70,8 @@ some [convenience methods](./src/datahub/emitter/mce_builder.py) for commonly us
### 4. Set up the dependencies
+Note: Steps 4-8 are only required if you intend to contribute the source back to the Datahub project.
+
Declare the source's pip dependencies in the `plugins` variable of the [setup script](./setup.py).
### 5. Enable discoverability
@@ -119,37 +120,38 @@ from datahub.ingestion.api.decorators import (
@capability(SourceCapability.LINEAGE_COARSE, "Enabled by default")
class FileSource(Source):
"""
-
- The File Source can be used to produce all kinds of metadata from a generic metadata events file.
+
+ The File Source can be used to produce all kinds of metadata from a generic metadata events file.
:::note
Events in this file can be in MCE form or MCP form.
:::
-
+
"""
... source code goes here
```
-
#### 7.2 Write custom documentation
-- Create a copy of [`source-docs-template.md`](./source-docs-template.md) and edit all relevant components.
+- Create a copy of [`source-docs-template.md`](./source-docs-template.md) and edit all relevant components.
- Name the document as `` and move it to `metadata-ingestion/docs/sources//.md`. For example for the Kafka platform, under the `kafka` plugin, move the document to `metadata-ingestion/docs/sources/kafka/kafka.md`.
- Add a quickstart recipe corresponding to the plugin under `metadata-ingestion/docs/sources//_recipe.yml`. For example, for the Kafka platform, under the `kafka` plugin, there is a quickstart recipe located at `metadata-ingestion/docs/sources/kafka/kafka_recipe.yml`.
- To write platform-specific documentation (that is cross-plugin), write the documentation under `metadata-ingestion/docs/sources//README.md`. For example, cross-plugin documentation for the BigQuery platform is located under `metadata-ingestion/docs/sources/bigquery/README.md`.
#### 7.3 Viewing the Documentation
-Documentation for the source can be viewed by running the documentation generator from the `docs-website` module.
+Documentation for the source can be viewed by running the documentation generator from the `docs-website` module.
##### Step 1: Build the Ingestion docs
+
```console
# From the root of DataHub repo
./gradlew :metadata-ingestion:docGen
```
If this finishes successfully, you will see output messages like:
+
```console
Ingestion Documentation Generation Complete
############################################
@@ -170,7 +172,8 @@ Ingestion Documentation Generation Complete
You can also find documentation files generated at `./docs/generated/ingestion/sources` relative to the root of the DataHub repo. You should be able to locate your specific source's markdown file here and investigate it to make sure things look as expected.
#### Step 2: Build the Entire Documentation
-To view how this documentation looks in the browser, there is one more step. Just build the entire docusaurus page from the `docs-website` module.
+
+To view how this documentation looks in the browser, there is one more step. Just build the entire docusaurus page from the `docs-website` module.
```console
# From the root of DataHub repo
@@ -178,6 +181,7 @@ To view how this documentation looks in the browser, there is one more step. Jus
```
This will generate messages like:
+
```console
...
> Task :docs-website:yarnGenerate
@@ -219,15 +223,15 @@ BUILD SUCCESSFUL in 35s
36 actionable tasks: 16 executed, 20 up-to-date
```
-After this you need to run the following script from the `docs-website` module.
+After this you need to run the following script from the `docs-website` module.
+
```console
cd docs-website
npm run serve
```
-Now, browse to http://localhost:3000 or whichever port npm is running on, to browse the docs.
-Your source should show up on the left sidebar under `Metadata Ingestion / Sources`.
-
+Now, browse to http://localhost:3000 or whichever port npm is running on, to browse the docs.
+Your source should show up on the left sidebar under `Metadata Ingestion / Sources`.
### 8. Add SQL Alchemy mapping (if applicable)
From 4a4c29030c0cfd2da9eab01798bc74a94fbb8c1d Mon Sep 17 00:00:00 2001
From: Harshal Sheth
Date: Mon, 6 Nov 2023 12:47:24 -0800
Subject: [PATCH 30/34] chore: stop ingestion-smoke CI errors on forks (#9160)
---
.github/workflows/docker-ingestion-smoke.yml | 1 +
1 file changed, 1 insertion(+)
diff --git a/.github/workflows/docker-ingestion-smoke.yml b/.github/workflows/docker-ingestion-smoke.yml
index 8d52c23792857..82b57d23609a5 100644
--- a/.github/workflows/docker-ingestion-smoke.yml
+++ b/.github/workflows/docker-ingestion-smoke.yml
@@ -47,6 +47,7 @@ jobs:
name: Build and Push Docker Image to Docker Hub
runs-on: ubuntu-latest
needs: setup
+ if: ${{ needs.setup.outputs.publish == 'true' }}
steps:
- name: Check out the repo
uses: actions/checkout@v3
From 86d2b08d2bbecc90e9adffd250c894abe54667e7 Mon Sep 17 00:00:00 2001
From: Harshal Sheth
Date: Mon, 6 Nov 2023 12:58:07 -0800
Subject: [PATCH 31/34] docs(ingest): inherit capabilities from superclasses
(#9174)
---
metadata-ingestion-modules/airflow-plugin/setup.py | 4 ++++
.../src/datahub/ingestion/api/decorators.py | 12 +++++++++++-
.../source/state/stateful_ingestion_base.py | 8 +++++++-
3 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/metadata-ingestion-modules/airflow-plugin/setup.py b/metadata-ingestion-modules/airflow-plugin/setup.py
index a5af881022d8c..e88fc870cb333 100644
--- a/metadata-ingestion-modules/airflow-plugin/setup.py
+++ b/metadata-ingestion-modules/airflow-plugin/setup.py
@@ -101,6 +101,10 @@ def get_long_description():
f"acryl-datahub[testing-utils]{_self_pin}",
# Extra requirements for loading our test dags.
"apache-airflow[snowflake]>=2.0.2",
+ # Connexion's new version breaks Airflow:
+ # See https://github.com/apache/airflow/issues/35234.
+ # TODO: We should transition to using Airflow's constraints file.
+ "connexion<3",
# https://github.com/snowflakedb/snowflake-sqlalchemy/issues/350
# Eventually we want to set this to "snowflake-sqlalchemy>=1.4.3".
# However, that doesn't work with older versions of Airflow. Instead
diff --git a/metadata-ingestion/src/datahub/ingestion/api/decorators.py b/metadata-ingestion/src/datahub/ingestion/api/decorators.py
index 5e4427047104f..b390ffb9dd036 100644
--- a/metadata-ingestion/src/datahub/ingestion/api/decorators.py
+++ b/metadata-ingestion/src/datahub/ingestion/api/decorators.py
@@ -93,10 +93,20 @@ def capability(
"""
def wrapper(cls: Type) -> Type:
- if not hasattr(cls, "__capabilities"):
+ if not hasattr(cls, "__capabilities") or any(
+ # It's from this class and not a superclass.
+ cls.__capabilities is getattr(base, "__capabilities", None)
+ for base in cls.__bases__
+ ):
cls.__capabilities = {}
cls.get_capabilities = lambda: cls.__capabilities.values()
+ # If the superclasses have capability annotations, copy those over.
+ for base in cls.__bases__:
+ base_caps = getattr(base, "__capabilities", None)
+ if base_caps:
+ cls.__capabilities.update(base_caps)
+
cls.__capabilities[capability_name] = CapabilitySetting(
capability=capability_name, description=description, supported=supported
)
diff --git a/metadata-ingestion/src/datahub/ingestion/source/state/stateful_ingestion_base.py b/metadata-ingestion/src/datahub/ingestion/source/state/stateful_ingestion_base.py
index 7fb2cf9813cab..d11b1f9ad6a53 100644
--- a/metadata-ingestion/src/datahub/ingestion/source/state/stateful_ingestion_base.py
+++ b/metadata-ingestion/src/datahub/ingestion/source/state/stateful_ingestion_base.py
@@ -15,11 +15,12 @@
from datahub.configuration.time_window_config import BaseTimeWindowConfig
from datahub.configuration.validate_field_rename import pydantic_renamed_field
from datahub.ingestion.api.common import PipelineContext
+from datahub.ingestion.api.decorators import capability
from datahub.ingestion.api.ingestion_job_checkpointing_provider_base import (
IngestionCheckpointingProviderBase,
JobId,
)
-from datahub.ingestion.api.source import Source, SourceReport
+from datahub.ingestion.api.source import Source, SourceCapability, SourceReport
from datahub.ingestion.source.state.checkpoint import Checkpoint, StateType
from datahub.ingestion.source.state.use_case_handler import (
StatefulIngestionUsecaseHandlerBase,
@@ -177,6 +178,11 @@ class StatefulIngestionReport(SourceReport):
pass
+@capability(
+ SourceCapability.DELETION_DETECTION,
+ "Optionally enabled via `stateful_ingestion.remove_stale_metadata`",
+ supported=True,
+)
class StatefulIngestionSourceBase(Source):
"""
Defines the base class for all stateful sources.
From 2c58c63780970606e50ba95b382dc9ffbde17bfc Mon Sep 17 00:00:00 2001
From: Andrew Sikowitz
Date: Mon, 6 Nov 2023 15:58:57 -0500
Subject: [PATCH 32/34] fix(ingest/datahub-source): Order by version in memory
(#9185)
---
.../source/datahub/datahub_database_reader.py | 100 ++++++++++++++----
.../tests/unit/test_datahub_source.py | 51 +++++++++
2 files changed, 133 insertions(+), 18 deletions(-)
create mode 100644 metadata-ingestion/tests/unit/test_datahub_source.py
diff --git a/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py b/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py
index 96184d8d445e4..e4f1bb275487e 100644
--- a/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py
+++ b/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py
@@ -1,9 +1,11 @@
import json
import logging
from datetime import datetime
-from typing import Dict, Iterable, Optional, Tuple
+from typing import Any, Generic, Iterable, List, Optional, Tuple, TypeVar
from sqlalchemy import create_engine
+from sqlalchemy.engine import Row
+from typing_extensions import Protocol
from datahub.emitter.aspect import ASPECT_MAP
from datahub.emitter.mcp import MetadataChangeProposalWrapper
@@ -20,6 +22,62 @@
DATETIME_FORMAT = "%Y-%m-%d %H:%M:%S.%f"
+class VersionOrderable(Protocol):
+ createdon: Any # Should restrict to only orderable types
+ version: int
+
+
+ROW = TypeVar("ROW", bound=VersionOrderable)
+
+
+class VersionOrderer(Generic[ROW]):
+ """Orders rows by (createdon, version == 0).
+
+ That is, orders rows first by createdon, and for equal timestamps, puts version 0 rows last.
+ """
+
+ def __init__(self, enabled: bool):
+ # Stores all version 0 aspects for a given createdon timestamp
+ # Once we have emitted all aspects for a given timestamp, we can emit the version 0 aspects
+ # Guaranteeing that, for a given timestamp, we always ingest version 0 aspects last
+ self.queue: Optional[Tuple[datetime, List[ROW]]] = None
+ self.enabled = enabled
+
+ def __call__(self, rows: Iterable[ROW]) -> Iterable[ROW]:
+ for row in rows:
+ yield from self._process_row(row)
+ yield from self._flush_queue()
+
+ def _process_row(self, row: ROW) -> Iterable[ROW]:
+ if not self.enabled:
+ yield row
+ return
+
+ yield from self._attempt_queue_flush(row)
+ if row.version == 0:
+ self._add_to_queue(row)
+ else:
+ yield row
+
+ def _add_to_queue(self, row: ROW) -> None:
+ if self.queue is None:
+ self.queue = (row.createdon, [row])
+ else:
+ self.queue[1].append(row)
+
+ def _attempt_queue_flush(self, row: ROW) -> Iterable[ROW]:
+ if self.queue is None:
+ return
+
+ if row.createdon > self.queue[0]:
+ yield from self._flush_queue()
+
+ def _flush_queue(self) -> Iterable[ROW]:
+ if self.queue is not None:
+ yield from self.queue[1]
+ self.queue = None
+
+
class DataHubDatabaseReader:
def __init__(
self,
@@ -40,13 +98,14 @@ def query(self) -> str:
# Offset is generally 0, unless we repeat the same createdon twice
# Ensures stable order, chronological per (urn, aspect)
- # Version 0 last, only when createdon is the same. Otherwise relies on createdon order
+ # Relies on createdon order to reflect version order
+ # Ordering of entries with the same createdon is handled by VersionOrderer
return f"""
- SELECT urn, aspect, metadata, systemmetadata, createdon
+ SELECT urn, aspect, metadata, systemmetadata, createdon, version
FROM {self.engine.dialect.identifier_preparer.quote(self.config.database_table_name)}
WHERE createdon >= %(since_createdon)s
{"" if self.config.include_all_versions else "AND version = 0"}
- ORDER BY createdon, urn, aspect, CASE WHEN version = 0 THEN 1 ELSE 0 END, version
+ ORDER BY createdon, urn, aspect, version
LIMIT %(limit)s
OFFSET %(offset)s
"""
@@ -54,6 +113,14 @@ def query(self) -> str:
def get_aspects(
self, from_createdon: datetime, stop_time: datetime
) -> Iterable[Tuple[MetadataChangeProposalWrapper, datetime]]:
+ orderer = VersionOrderer[Row](enabled=self.config.include_all_versions)
+ rows = self._get_rows(from_createdon=from_createdon, stop_time=stop_time)
+ for row in orderer(rows):
+ mcp = self._parse_row(row)
+ if mcp:
+ yield mcp, row.createdon
+
+ def _get_rows(self, from_createdon: datetime, stop_time: datetime) -> Iterable[Row]:
with self.engine.connect() as conn:
ts = from_createdon
offset = 0
@@ -69,34 +136,31 @@ def get_aspects(
return
for i, row in enumerate(rows):
- row_dict = row._asdict()
- mcp = self._parse_row(row_dict)
- if mcp:
- yield mcp, row_dict["createdon"]
+ yield row
- if ts == row_dict["createdon"]:
- offset += i
+ if ts == row.createdon:
+ offset += i + 1
else:
- ts = row_dict["createdon"]
+ ts = row.createdon
offset = 0
- def _parse_row(self, d: Dict) -> Optional[MetadataChangeProposalWrapper]:
+ def _parse_row(self, row: Row) -> Optional[MetadataChangeProposalWrapper]:
try:
- json_aspect = post_json_transform(json.loads(d["metadata"]))
- json_metadata = post_json_transform(json.loads(d["systemmetadata"] or "{}"))
+ json_aspect = post_json_transform(json.loads(row.metadata))
+ json_metadata = post_json_transform(json.loads(row.systemmetadata or "{}"))
system_metadata = SystemMetadataClass.from_obj(json_metadata)
return MetadataChangeProposalWrapper(
- entityUrn=d["urn"],
- aspect=ASPECT_MAP[d["aspect"]].from_obj(json_aspect),
+ entityUrn=row.urn,
+ aspect=ASPECT_MAP[row.aspect].from_obj(json_aspect),
systemMetadata=system_metadata,
changeType=ChangeTypeClass.UPSERT,
)
except Exception as e:
logger.warning(
- f"Failed to parse metadata for {d['urn']}: {e}", exc_info=True
+ f"Failed to parse metadata for {row.urn}: {e}", exc_info=True
)
self.report.num_database_parse_errors += 1
self.report.database_parse_errors.setdefault(
str(e), LossyDict()
- ).setdefault(d["aspect"], LossyList()).append(d["urn"])
+ ).setdefault(row.aspect, LossyList()).append(row.urn)
return None
diff --git a/metadata-ingestion/tests/unit/test_datahub_source.py b/metadata-ingestion/tests/unit/test_datahub_source.py
new file mode 100644
index 0000000000000..adc131362b326
--- /dev/null
+++ b/metadata-ingestion/tests/unit/test_datahub_source.py
@@ -0,0 +1,51 @@
+from dataclasses import dataclass
+
+import pytest
+
+from datahub.ingestion.source.datahub.datahub_database_reader import (
+ VersionOrderable,
+ VersionOrderer,
+)
+
+
+@dataclass
+class MockRow(VersionOrderable):
+ createdon: int
+ version: int
+ urn: str
+
+
+@pytest.fixture
+def rows():
+ return [
+ MockRow(0, 0, "one"),
+ MockRow(0, 1, "one"),
+ MockRow(0, 0, "two"),
+ MockRow(0, 0, "three"),
+ MockRow(0, 1, "three"),
+ MockRow(0, 2, "three"),
+ MockRow(0, 1, "two"),
+ MockRow(0, 4, "three"),
+ MockRow(0, 5, "three"),
+ MockRow(1, 6, "three"),
+ MockRow(1, 0, "four"),
+ MockRow(2, 0, "five"),
+ MockRow(2, 1, "six"),
+ MockRow(2, 0, "six"),
+ MockRow(3, 0, "seven"),
+ MockRow(3, 0, "eight"),
+ ]
+
+
+def test_version_orderer(rows):
+ orderer = VersionOrderer[MockRow](enabled=True)
+ ordered_rows = list(orderer(rows))
+ assert ordered_rows == sorted(
+ ordered_rows, key=lambda x: (x.createdon, x.version == 0)
+ )
+
+
+def test_version_orderer_disabled(rows):
+ orderer = VersionOrderer[MockRow](enabled=False)
+ ordered_rows = list(orderer(rows))
+ assert ordered_rows == rows
From f2ce3ab62cc29bd0d4d4cade2577a50a39fa0f32 Mon Sep 17 00:00:00 2001
From: david-leifker <114954101+david-leifker@users.noreply.github.com>
Date: Mon, 6 Nov 2023 15:19:55 -0600
Subject: [PATCH 33/34] lint(frontend): fix HeaderLinks lint error (#9189)
---
.../src/app/shared/admin/HeaderLinks.tsx | 28 +++++++++----------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/datahub-web-react/src/app/shared/admin/HeaderLinks.tsx b/datahub-web-react/src/app/shared/admin/HeaderLinks.tsx
index 3f46f35889fd1..4a7a4938ea970 100644
--- a/datahub-web-react/src/app/shared/admin/HeaderLinks.tsx
+++ b/datahub-web-react/src/app/shared/admin/HeaderLinks.tsx
@@ -105,20 +105,20 @@ export function HeaderLinks(props: Props) {
View and modify your data dictionary
-
+
}
>
From 34aa08b7f38d733adcfe31ca97131e1ea52b49e6 Mon Sep 17 00:00:00 2001
From: John Joyce
Date: Mon, 6 Nov 2023 16:51:05 -0800
Subject: [PATCH 34/34] refactor(ui): Refactor entity page loading indicators
(#9195)
unrelated smoke test failing.
---
.../src/app/entity/EntityPage.tsx | 4 +-
.../containers/profile/EntityProfile.tsx | 3 --
.../profile/header/EntityHeader.tsx | 46 +++++++++++--------
.../header/EntityHeaderLoadingSection.tsx | 29 ++++++++++++
.../src/app/lineage/LineageExplorer.tsx | 7 +--
.../src/app/lineage/LineageLoadingSection.tsx | 27 +++++++++++
6 files changed, 86 insertions(+), 30 deletions(-)
create mode 100644 datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeaderLoadingSection.tsx
create mode 100644 datahub-web-react/src/app/lineage/LineageLoadingSection.tsx
diff --git a/datahub-web-react/src/app/entity/EntityPage.tsx b/datahub-web-react/src/app/entity/EntityPage.tsx
index 09233dbd89f69..916fa41795412 100644
--- a/datahub-web-react/src/app/entity/EntityPage.tsx
+++ b/datahub-web-react/src/app/entity/EntityPage.tsx
@@ -8,7 +8,6 @@ import { useEntityRegistry } from '../useEntityRegistry';
import analytics, { EventType } from '../analytics';
import { decodeUrn } from './shared/utils';
import { useGetGrantedPrivilegesQuery } from '../../graphql/policy.generated';
-import { Message } from '../shared/Message';
import { UnauthorizedPage } from '../authorization/UnauthorizedPage';
import { ErrorSection } from '../shared/error/ErrorSection';
import { VIEW_ENTITY_PAGE } from './shared/constants';
@@ -34,7 +33,7 @@ export const EntityPage = ({ entityType }: Props) => {
const isLineageSupported = entity.isLineageEnabled();
const isLineageMode = useIsLineageMode();
const authenticatedUserUrn = useUserContext()?.user?.urn;
- const { loading, error, data } = useGetGrantedPrivilegesQuery({
+ const { error, data } = useGetGrantedPrivilegesQuery({
variables: {
input: {
actorUrn: authenticatedUserUrn as string,
@@ -71,7 +70,6 @@ export const EntityPage = ({ entityType }: Props) => {
return (
<>
- {loading && }
{error && }
{data && !canViewEntityPage && }
{canViewEntityPage &&
diff --git a/datahub-web-react/src/app/entity/shared/containers/profile/EntityProfile.tsx b/datahub-web-react/src/app/entity/shared/containers/profile/EntityProfile.tsx
index 5384eb94429ed..74c127cb05dd9 100644
--- a/datahub-web-react/src/app/entity/shared/containers/profile/EntityProfile.tsx
+++ b/datahub-web-react/src/app/entity/shared/containers/profile/EntityProfile.tsx
@@ -4,7 +4,6 @@ import { MutationHookOptions, MutationTuple, QueryHookOptions, QueryResult } fro
import styled from 'styled-components/macro';
import { useHistory } from 'react-router';
import { EntityType, Exact } from '../../../../../types.generated';
-import { Message } from '../../../../shared/Message';
import {
getEntityPath,
getOnboardingStepIdsForEntityType,
@@ -274,7 +273,6 @@ export const EntityProfile = ({
}}
>
<>
- {loading && }
{(error && ) ||
(!loading && (
@@ -323,7 +321,6 @@ export const EntityProfile = ({
banner
/>
)}
- {loading && }
{(error && ) || (
{isLineageMode ? (
diff --git a/datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeader.tsx b/datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeader.tsx
index 97595a515b34d..69389f5dcf6fc 100644
--- a/datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeader.tsx
+++ b/datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeader.tsx
@@ -16,6 +16,7 @@ import ShareButton from '../../../../../shared/share/ShareButton';
import { capitalizeFirstLetterOnly } from '../../../../../shared/textUtil';
import { useUserContext } from '../../../../../context/useUserContext';
import { useEntityRegistry } from '../../../../../useEntityRegistry';
+import EntityHeaderLoadingSection from './EntityHeaderLoadingSection';
const TitleWrapper = styled.div`
display: flex;
@@ -81,7 +82,7 @@ type Props = {
};
export const EntityHeader = ({ headerDropdownItems, headerActionItems, isNameEditable, subHeader }: Props) => {
- const { urn, entityType, entityData } = useEntityData();
+ const { urn, entityType, entityData, loading } = useEntityData();
const refetch = useRefetch();
const me = useUserContext();
const platformName = getPlatformName(entityData);
@@ -99,25 +100,32 @@ export const EntityHeader = ({ headerDropdownItems, headerActionItems, isNameEdi
<>
-
-
-
- {entityData?.deprecation?.deprecated && (
-
- )}
- {entityData?.health && (
- ) || (
+ <>
+
+
+
+ {entityData?.deprecation?.deprecated && (
+
+ )}
+ {entityData?.health && (
+
+ )}
+
+
- )}
-
-
+ >
+ )}
diff --git a/datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeaderLoadingSection.tsx b/datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeaderLoadingSection.tsx
new file mode 100644
index 0000000000000..bbf813804edd4
--- /dev/null
+++ b/datahub-web-react/src/app/entity/shared/containers/profile/header/EntityHeaderLoadingSection.tsx
@@ -0,0 +1,29 @@
+import * as React from 'react';
+import { Skeleton, Space } from 'antd';
+import styled from 'styled-components';
+import { ANTD_GRAY } from '../../../constants';
+
+const ContextSkeleton = styled(Skeleton.Input)`
+ && {
+ width: 320px;
+ border-radius: 4px;
+ background-color: ${ANTD_GRAY[3]};
+ }
+`;
+
+const NameSkeleton = styled(Skeleton.Input)`
+ && {
+ width: 240px;
+ border-radius: 4px;
+ background-color: ${ANTD_GRAY[3]};
+ }
+`;
+
+export default function EntityHeaderLoadingSection() {
+ return (
+
+
+
+
+ );
+}
diff --git a/datahub-web-react/src/app/lineage/LineageExplorer.tsx b/datahub-web-react/src/app/lineage/LineageExplorer.tsx
index ed0b26bde11ef..f59d1843b8a99 100644
--- a/datahub-web-react/src/app/lineage/LineageExplorer.tsx
+++ b/datahub-web-react/src/app/lineage/LineageExplorer.tsx
@@ -3,7 +3,6 @@ import { useHistory } from 'react-router';
import { Button, Drawer } from 'antd';
import { InfoCircleOutlined } from '@ant-design/icons';
import styled from 'styled-components';
-import { Message } from '../shared/Message';
import { useEntityRegistry } from '../useEntityRegistry';
import CompactContext from '../shared/CompactContext';
import { EntityAndType, EntitySelectParams, FetchedEntities } from './types';
@@ -18,12 +17,10 @@ import { ErrorSection } from '../shared/error/ErrorSection';
import usePrevious from '../shared/usePrevious';
import { useGetLineageTimeParams } from './utils/useGetLineageTimeParams';
import analytics, { EventType } from '../analytics';
+import LineageLoadingSection from './LineageLoadingSection';
const DEFAULT_DISTANCE_FROM_TOP = 106;
-const LoadingMessage = styled(Message)`
- margin-top: 10%;
-`;
const FooterButtonGroup = styled.div`
display: flex;
justify-content: space-between;
@@ -167,7 +164,7 @@ export default function LineageExplorer({ urn, type }: Props) {
return (
<>
{error && }
- {loading && }
+ {loading && }
{!!data && (