DataHub v0.8.25.1
Release Highlights
Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.
Notable UI-Based Features
UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.
Notable Metadata Model & Ingestion-Based Features
Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
Avro files are now supported in the Data Lake File ingestion source
Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the datahub migrate command to migrate them over to platform instances.
Ignore users from Top Users calculation
feat(ingestion): Adding ability to ignore users from top users calculation by @treff7es in datahub-project#3735
BigQuery - Data Profiling on only the latest partition/shard
feat(ingestion) bigquery: Profiling only the latest partition/shard on bigquery by @treff7es in datahub-project#3930
(feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in datahub-project#3813
Notable Fixes
Fix to support View in Looker * feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in datahub-project#3985
fix(graphql): support group display name in ownership by @thomasplarsson in datahub-project#3979
fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in datahub-project#3990
fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in datahub-project#3926
DataHub Usage Guides
docs(domains): Adding a User Guide for Domains by @jjoyce0510 in datahub-project#4038
docs(ingest): Adding UI ingestion guide by @jjoyce0510 in datahub-project#4048
What's Changed
fix(vulnerability): Upgrade gms base image by @dexter-mh-lee in datahub-project#3962
logging(frontend): Improve OIDC debug logs by @jjoyce0510 in datahub-project#3967
docs(delete): add curl request example to delete entity by @anshbansal in datahub-project#3928
fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in datahub-project#3926
Feature/dynamic platform icons by @RyanHolstien in datahub-project#3968
refactor(ingestion): remove duplicate aspect type by @hsheth2 in datahub-project#3972
fix(example): fix typo by @anshbansal in datahub-project#3907
fix(ingestion): Restrict python to <=3.9.9 by @treff7es in datahub-project#3961
feat(build): remove requirement for git directory for builds by @swaroopjagadish in datahub-project#3977
fix(ingestion): tighten conditions for restli json transformation by @hsheth2 in datahub-project#3973
fix(ingestion): don't dump variables for config errors by @hsheth2 in datahub-project#3974
Bugfix/increase socket timeout by @RyanHolstien in datahub-project#3982
feat(ingest): support for Avro data lake files by @kevinhu in datahub-project#3913
fix(build): exclude old log4j core by @RickardCardell in datahub-project#3966
fix(quickstart): Pin Quickstart version to v0.8.23. by @jjoyce0510 in datahub-project#3983
feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in datahub-project#3985
fix(graphql): support group display name in ownership by @thomasplarsson in datahub-project#3979
fix(quickstart): Assign correct mysql-setup container for M1 and remove "head" default version. by @jjoyce0510 in datahub-project#3987
feat(embedded search results): support custom endpoints in embedded search result by @gabe-lyons in datahub-project#3986
fix(docker): datahub-gms - build in native, copy to target by @swaroopjagadish in datahub-project#3992
fix(ci): moving defaults back to head now that docker builds are green by @swaroopjagadish in datahub-project#3993
feat(ui): UI-based ingestion (as featured in Dec Townhall) by @jjoyce0510 in datahub-project#3975
quickstart: Adding UI ingestion to quickstart YAML by @jjoyce0510 in datahub-project#3994
feat(domains): Adding backend for Asset Domains (p1) by @jjoyce0510 in datahub-project#3952
Bug: a bug fix to bigquery_to_datahub.yml file by @dipeshmaurya in datahub-project#3988
fix(ingest): check if feature data type is present by @maaaikoool in datahub-project#3932
feat(platform-instance): a simple client-only change to support platf… by @swaroopjagadish in datahub-project#3996
docs(metadata-model): Adding to Metadata model docs by @jjoyce0510 in datahub-project#3998
Add Stash Logo & new Source Icons by @maggiehays in datahub-project#4002
feat(domains): UI for Asset Domains (p2) by @jjoyce0510 in datahub-project#3995
docs: add missing back tick for metadata-ingestion/README.md by @nickwu241 in datahub-project#4003
Bugfix/add missing classes by @RyanHolstien in datahub-project#4000
fix(superset): fix connection for redshift by @anshbansal in datahub-project#3944
fix(setup): fix setup for M1 by @anshbansal in datahub-project#3958
docs:add Optum logo by @maggiehays in datahub-project#4005
Refining Metadata Model docs further by @jjoyce0510 in datahub-project#4001
fix(docker): Alpine based multiplatform docker build for kafka-setup by @treff7es in datahub-project#3991
Bugfix/graph concurrency issue by @RyanHolstien in datahub-project#4007
feat(ingest): Add additional snowflake auth by @MikeSchlosser16 in datahub-project#4009
fix(ci): Reverting unnecessary domain test changes by @jjoyce0510 in datahub-project#4013
fix(metrics): Add metrics for mcl hooks by @dexter-mh-lee in datahub-project#4008
feat(platform) - Update FabricType enum to represent more fabrics by @aditya-radhakrishnan in datahub-project#3997
feat(ingest): emit flags and stats for profiling telemetry by @kevinhu in datahub-project#3969
fix(formatting): fix linting lib version requirement by @anshbansal in datahub-project#3939
fix(docs): fix business glossary docs by @anshbansal in datahub-project#3916
fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in datahub-project#3990
fix(docs): update gms link by @lhvubtqn in datahub-project#3927
fix(ingest): lint fix a few files by @swaroopjagadish in datahub-project#4016
fix(ingest): adding platform instance urn to data platform instance aspects by @swaroopjagadish in datahub-project#4015
feat(ingest): use trino python client for sqlalchemy, supports python… by @mayurinehate in datahub-project#3888
fix(spark-lineage): select mock server port dynamically for unit test by @MugdhaHardikar-GSLab in datahub-project#4018
(feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in datahub-project#3813
Test/add concurrency issue smoke test by @RyanHolstien in datahub-project#4014
feat(glossary-terms): Index glossary term custom properties by @jjoyce0510 in datahub-project#3960
feat(ingestion): Adding ability to ignore users from top users calculation by @treff7es in datahub-project#3735
Docs/remote deploy and auto render by @RyanHolstien in datahub-project#4020
fix(ingest): snowflake - Run authentication validation if default value used by @treff7es in datahub-project#4024
feat(nifi): handle provenance api variation for older versions by @mayurinehate in datahub-project#4022
feat(ingestion) bigquery: Profiling only the latest partition/shard on bigquery by @treff7es in datahub-project#3930
fix(groups): Fix UI encoding of groups with spaces in urns by @jjoyce0510 in datahub-project#4021
fix(text): fix confusing text by @anshbansal in datahub-project#4025
fix(clean): add missing cleanup by @anshbansal in datahub-project#4023
feat(containers): Backend for Asset Containers (as demo'd in townhall) by @jjoyce0510 in datahub-project#4019
fix(docs): Adding Initiate login uri to okta docs (Okta OIDC) by @jjoyce0510 in datahub-project#4030
fix: docker-compose now persists kafka broker data by @icy in datahub-project#4031
feat(ingestion): Support Kafka confluent external schema resolution by name or subject by @rslanka in datahub-project#4035
docs(domains): Adding a User Guide for Domains by @jjoyce0510 in datahub-project#4038
feat(Stateful Ingestion-3/3): Client side changes for Monitoring/Reporting by @rslanka in datahub-project#3807
feat(containers): Adding Containers UI (as demo'd in Jan Townhall) by @jjoyce0510 in datahub-project#4037
feat(users): adding user graphql mutation by @gabe-lyons in datahub-project#4033
feat(ingest): add tests for platform instance by @swaroopjagadish in datahub-project#4047
feat(model): Data quality model by @ksrinath in datahub-project#3787
Bugfix/prevent invalid urn by @RyanHolstien in datahub-project#4045
refactor(spark-lineage): remove dependency of spark from McpEmitter by @MugdhaHardikar-GSLab in datahub-project#4042
feat(analytics): add more analytics for entities by @anshbansal in datahub-project#4040
docs(ingest): Adding UI ingestion guide by @jjoyce0510 in datahub-project#4048
fix(mae-consumer-docker): Fix condition for skipping elasticsearch check by @dexter-mh-lee in datahub-project#4052
feat(ci): pin tox requirements to speed up ci runs, remove airflow-1 … by @swaroopjagadish in datahub-project#4055
feat(container): Add domains aspect to container. by @jjoyce0510 in datahub-project#4059
feat(profile) - bigquery: Fix for hitting limit with too many partitioned tables by @treff7es in datahub-project#4056
[Docs] Mark data lake metadata source as Beta by @pedro93 in datahub-project#4061
feat(ingest): log CLI invocations and completions by @kevinhu in datahub-project#4062
fix(ingest): Add aws dependencies for data lake by @kevinhu in datahub-project#4060
fix(ingest) - add aws_common as a snowflake_common dependency by @aditya-radhakrishnan in datahub-project#4054
feat(ui): Add svg datahub loading logo by @eburairu in datahub-project#4065
refactor(models): Refactoring new Assertion models by @jjoyce0510 in datahub-project#4064
feat(cli): add --force option to ingest rollback subcommand by @danilopeixoto in datahub-project#4032
fix(analytics): fix missing events from UI by @anshbansal in datahub-project#4026
Data domain containers ingestion by @treff7es in datahub-project#4051
docs(ingestion) glue: document required IAM permissions by @iasoon in datahub-project#3929
fix(profile):bigquery - Check for every table if it is partitioned to not hit table quota by @treff7es in datahub-project#4074
New Contributors
@dipeshmaurya made their first contribution in datahub-project#3988
@maaaikoool made their first contribution in datahub-project#3932
@icy made their first contribution in datahub-project#4031
@ksrinath made their first contribution in datahub-project#3787
@eburairu made their first contribution in datahub-project#4065
@danilopeixoto made their first contribution in datahub-project#4032
Full Changelog: datahub-project/datahub@v0.8.24...v0.8.25