Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Azure Blob Storage: Fix overwrite mode #52610

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Jan 28, 2025

context: https://airbytehq-team.slack.com/archives/C083T7SUR0D/p1738094057451909?thread_ts=1738087568.864109&cid=C083T7SUR0D

previously:

  • if you had two streams in OVERWRITE mode, where one stream's name was a suffix of the other (e.g. users and foo_users)
  • during sync startup, we would delete *foo_users/*, then create a blob foo_users/<timestamp>_0
  • and then delete *users/*... including the file we just created for foo_users

This PR (I think) fixes that behavior, so that setup for users only touches files that actually belong to the users stream.

(also, add a comment, b/c this destination is doing some dumb things in general)

@edgao edgao requested a review from a team as a code owner January 28, 2025 20:54
Copy link

vercel bot commented Jan 28, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 29, 2025 5:39pm

// 2. We really should include the namespace in the blob name, but we currently don't...
// So current behavior is that if you have `public1.users` and `public2.users`,
// those files will probably conflict in some way which is undesired.
if (!blob.isDeleted() && blob.getName().startsWith(configuredStream.getStream().getName() + "/")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to unit test this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to figure that out :/ might need to write something into legacy DATs, which will be not very fun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look at 0886140 ? It's modeled after the existing test stuff, which is doing something funky with an azure blob emulation container?

but it fails without the contains -> startsWith change, and passes afterward

@edgao edgao force-pushed the edgao/azure_blob_fix branch from 0886140 to e0a33ca Compare January 29, 2025 17:33
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/azure-blob-storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants