-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Provide optional lock to prevent concurrent pipeline execution #105
Comments
@aaronsteers thoughts on how we could help out with this? Likely would just be checking if the same plugin:command is already executing, right? |
Featurewise, we could declare a new plugin command property that specifies only one copy can run at a time. That limit would need to be per environment, so prod would never be blocked by devtest, for instance. The challenge is that I don't know if the way we are logging commands today would work the same way it does for EL. In theory, though, this definitely could work. A second approach could be to create a dummy "command" before and after the dbt execution runs. That dummy command would basically "take" a lock and subsequently "release" the lock. You'd probably want to build a max-age of the lock, so it could self-heal, and you probably would want to have an explicit command to "release" the lock in cases that you know that its process is not running. A third option, and I think I like this best, would be to build the second solution into the dbt-ext plugin itself, and/or into the EDK, and have the ability to use prehooks and posthooks to do the same thing inline. The challenge then would be where to store the lock artifact. That could be easy or hard depending on the deployment scenario. |
I built a version of @aaronsteers 's option 2. It is available here. It's not pretty but it seems to do the trick. I think ultimately option 3 is the ideal; adding two extra commands to achieve this makes for an ugly pipeline command. :( Extending the EDK is beyond my capabilities at the moment though. ;) |
Meltano Version
2.8.0
Python Version
3.8
Bug scope
CLI (options, error messages, logging, etc.)
Operating System
Linux Ubuntu
Description
If we run
meltano run tap-something some-mapper target-something
and that pipeline is already running, meltano (correctly!) throws an "already running" error and exits. However, if instead wemeltano run tap-something some-mapper target-something dbt-postgres:run
,meltano run tap-something some-mapper target-something dbt-postgres:run
againmeltano will run the entire pipeline again, ultimately resulting in multiple copies of the same dbt project running at once. :(
If it matters we execute meltano via cron. The tap/mapper/target portion usually only takes a few minutes, but dbt often takes 20+ minutes to run. We had been planning to schedule the job for every 15 minutes and let meltano block concurrent runs when dbt was running long but unfortunately this prevents that.
Code
No response
The text was updated successfully, but these errors were encountered: