Replies: 2 comments 7 replies
-
@apethree could you give a bit more details? what is the use case to prevent multiple ML engineers using the same data for training? What is the layout of the project - one repo, multiple repos, do you use DVC data registry? |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response. Typical pipellin:
Problem: Solutions looking for: Example: |
Beta Was this translation helpful? Give feedback.
-
Hello,
My use case it store large data files structured/unstructured in repos for training data. But unique files only. Two employees might be working on the same file without knowing in two different repos/folder.
Example: Engineer one is working on FileA and has it's versions and clean data, Engineer 2 has FileB but what if they both were working on same file? I would like for DVC to compute hash on commit and prevent any other commits if a file with same hash already exists and show some kinda error.
Ex:
Not sure how to achieve this workflow. I do know DVC does a good job of managing versions of the same file. Very similar to UNIQUE constraint in SQL.
Beta Was this translation helpful? Give feedback.
All reactions