Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ADLS hint from import #7581

Merged
merged 5 commits into from
Mar 20, 2024
Merged

Remove ADLS hint from import #7581

merged 5 commits into from
Mar 20, 2024

Conversation

N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented Mar 19, 2024

Closes #7580

Change Description

Remove ADLS hint from import

Background

We are saving the physical address of imported objects in ADLS Gen2 accounts wrong which creates issues when using the physical address (to export tables for example)

Bug Fix

  1. Problem detected when trying to export tables of data that was imported from and adls gen 2 account
  2. On investigation we detected that the physical address is saved incorrectly for these type of accounts due to the hint (change in the url from blob to adls
  3. The hint is not longer necessary due to improvements of the import mechanism, we can use only one adapter for it.
  4. The quickest and best way to fix was to remove the hint and the old Azure BlobWalker

Testing Details

Verified by current unit and system tests

Breaking Change?

Yes - will require users who use the adls hint to stop using it

@N-o-Z N-o-Z requested a review from a team March 19, 2024 20:05
@N-o-Z N-o-Z self-assigned this Mar 19, 2024
@N-o-Z N-o-Z added the include-changelog PR description should be included in next release changelog label Mar 19, 2024
Copy link

github-actions bot commented Mar 19, 2024

♻️ PR Preview d9337ee has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

10 passed

@N-o-Z N-o-Z requested a review from guy-har March 19, 2024 20:18
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worried about backwards compatibility: if someone has an old workflow or even script:

  • We prefer not to break it
  • We must give a useful error message if they try.

As is, we must document this change, and it seems like a major version bump.

As an alternative, we might translate the modified URLs back to their original form, and ideally issue a warning that we did so. (Other alternatives are possible, of course!)

@@ -568,11 +568,11 @@ func (a *Adapter) CompleteMultiPartUpload(ctx context.Context, obj block.ObjectP

func (a *Adapter) GetStorageNamespaceInfo() block.StorageNamespaceInfo {
info := block.DefaultStorageNamespaceInfo(block.BlockstoreTypeAzure)
info.ImportValidityRegex = `^https?://[a-z0-9_-]+\.(blob|adls)\.core\.windows\.net` // added adls for import hint validation in UI
info.ImportValidityRegex = `^https?://[a-z0-9_-]+\.blob\.core\.windows\.net`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks any existing import scripts. Does this mean we need a major release 🤢 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in the PR description - this is a breaking change. But I believe we can avoid a major release since this affects specific users we can identify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a WA for lakectl so it doesn't break. Effectively this is only going to impact WebUI users (I'm fine with that)

### Azure Data Lake Gen2
{:.no_toc}

lakeFS requires a hint in the import source URL to understand that the provided storage account is ADLS Gen2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the change. Otherwise someone who's already imported has no way of understanding what went wrong!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added deprecation message

@N-o-Z
Copy link
Member Author

N-o-Z commented Mar 20, 2024

Worried about backwards compatibility: if someone has an old workflow or even script:

  • We prefer not to break it
  • We must give a useful error message if they try.

As is, we must document this change, and it seems like a major version bump.

As an alternative, we might translate the modified URLs back to their original form, and ideally issue a warning that we did so. (Other alternatives are possible, of course!)

We need to get rid of this - the sooner the better.
This "trick" was done for lack of any other viable alternative, and now that it's not required we should remove it. This already caused us bugs in the pre-signed url and delta export.
I think it's better to break it now while we have a handful of users who might use import in Azure ADLS Gen2, this way we can probably avoid releasing a major lakeFS version.

@N-o-Z
Copy link
Member Author

N-o-Z commented Mar 20, 2024

@arielshaqed Added also backwards compatibility for the delta exporter

@N-o-Z N-o-Z requested a review from arielshaqed March 20, 2024 09:52
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 166 to 167
Warning("'adls' hint will be deprecated soon, please use the original source url for import")
source = strings.Replace(source, "adls.core.windows.net", "blob.core.windows.net", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to say what is happening, not only what isn't:

Suggested change
Warning("'adls' hint will be deprecated soon, please use the original source url for import")
source = strings.Replace(source, "adls.core.windows.net", "blob.core.windows.net", 1)
Warning("'adls' hint will be deprecated soon, please use the original source url for import")
translatedSource = strings.Replace(source, "adls.core.windows.net", "blob.core.windows.net", 1)
if source != translatedSource {
Fmt(" Using %s\n", translatedSource)
}

@N-o-Z N-o-Z enabled auto-merge (squash) March 20, 2024 18:12
@N-o-Z N-o-Z merged commit c6d0c12 into master Mar 20, 2024
36 checks passed
@N-o-Z N-o-Z deleted the fix/remove-adls-hint-7580 branch March 20, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Import with adls hint creates physical addresses with adls in domain
2 participants