option --always-changed
of dvc run
#5925
Replies: 11 comments
-
@dashohoxha In both cases the connection between Closing as it seems to be resolved. Please feel free to reopen if I've missed anything. |
Beta Was this translation helpful? Give feedback.
-
What about this: dvc run -d ... -o s3://my-bucket/folder.parquet -o marker './create_dir.sh && ./generate_effective_md5.sh'
dvc run -d s3://my-bucket/folder.parquet -d marker -o ... './do_something_with_dir.sh' I think this is equivalent to the original example (and still doesn't need dvc run -d ... -o s3://my-bucket/folder.parquet ./create_dir.sh
dvc run -d s3://my-bucket/folder.parquet -o marker --always-changed ./generate_effective_md5.sh
dvc run -d marker -o ... ./do_something_with_dir.sh
I don't see why this is the case. It seems more like a personal preference.
This is a mind-boggling discussion for me, I am not sure I can follow it. But it seems that there is no general agreement about that issue about what to do. So, I don't know why that issue is still open, and this one was closed immediately, without any proper discussions.
Maybe I would have done it, but I don't have permissions to reopen issues. |
Beta Was this translation helpful? Give feedback.
-
It would always run
Because the definition of "considered always changed if there are no deps" is implicit as it doesn't contain clear marker, and "considered always changed if always_changed == True" is an explicit one. Not sure how this is a personal preference.
I guess I was wrong about this one being resolved. Reopening. |
Beta Was this translation helpful? Give feedback.
-
Maybe this one is better: dvc run -d ... -o s3://my-bucket/folder.parquet -o marker './create_dir.sh && ./generate_effective_md5.sh'
dvc run -d marker -o ... './do_something_with_dir.sh' So, The point is that with callback stages it is possible to solve all the problems that can be solved with
Maybe we have a terminology problem. If we call a stage that has no dependencies "unconditioned" (or "unrestricted", or "unconstrained", or "independent", etc.) it would feel natural to execute its command all the times. |
Beta Was this translation helpful? Give feedback.
-
@dashohoxha Good point, that one would work, but at the expense of mashing stages together :)
With callback stages and mashing commands together, yes :)
In 1.0 we'll consider dropping that behavior in favor of explicit
That is a nice way to put it! But that is definitely not something very obvious, unlike dead simple --always-changed. |
Beta Was this translation helpful? Give feedback.
-
Also at the benefit of using what is available :)
Do what you think is best. But I am in favor of not changing callback stages, except for their name (for example calling them "unconditioned" changes). |
Beta Was this translation helpful? Give feedback.
-
@dashohoxha concerns are valid.
@efiop , I think it is, since we don't have a design philosophy for DVC (at least, not a written one). There are people that prefer a minimal design:
Others are on the opposite side, rooting for a monolithic tool to do everything; @efiop , I'm still not sure what are the next steps for this issue, tho; agreeing that we are not going to deprecate |
Beta Was this translation helpful? Give feedback.
-
The Zen of Python, "explicit is better than implicit", is IMO a good argument for However, to the root of this issue: the proposed solutions may solve most of the problems Issue #2378 is about giving DVC the ability to check the status of files that are not entirely under its control. Integrating commands together to produce the marker file assumes that the only file changes we need to be able to detect are changes that are caused by a |
Beta Was this translation helpful? Give feedback.
-
I am also for removing BTW,
is not smashing stages together. This is a single stage, separating it is an error, which led to the original issue and confusion. This is how dvc operates, outputs glue stages together. If some stage generates non-file, i.e. updates database state or something, then we simulate output with a marker, which is effectively an output from dvc POV. So I am for removing |
Beta Was this translation helpful? Give feedback.
-
Thinking a bit more about it, we should handle a situation when a non-file dep (A) might be modified by some external activity to dvc, in this case we will have a stale marker, which will no longer correspond to non-file dep state. The solution is a callback state updating a marker each time, then any stage dependent on A may still rely on depending on the marker. There goes an order issue. Say A might be changed both in dvc process and externally, then we will have several stages: An update A stage: cmd: update_A && update_marker
outs:
path: marker
deps: maybe some An A status stage: cmd: update_marker
outs:
path: marker And a dependent stage: cmd: work_with_A
deps:
path: marker We have several issues here, which, I think we all can ignore:
The real reqs are always update the marker (to handle outside activity) and always update marker after updating A. Both are satisfied here. The real issue, which we can't simply handle is reproducibility though. Say we want to rerun a dependent stage on an older A state, we can't, we only do that for files. We may discuss what should (or shouldn't) we do about that. |
Beta Was this translation helpful? Give feedback.
-
A general takeout from all of this - we should define where dvc responsibility ends, like say non-file non-dvc managed deps. Trying to stretch our responsibility over those limits means bringing together mutually contradictory objectives and won't lead to anything good. |
Beta Was this translation helpful? Give feedback.
-
There is a discussion that illustrates the usage of
--always-changed
with this example:I was thinking if we could solve the same problem with a callback stage, like this:
or like this:
dvc run -o marker './create_dir.sh && ./generate_effective_md5.sh' dvc run -d marker -o ... ./do_something_with_dir.sh
It seems like these are equivalent to the first example.
Then I realized that maybe the option
--always-changed
is not needed at all, since in all the cases we can convert it to a callback stage that is equivalent.Indeed, if
--always-changed
ignores the dependencies, then just don't specify any dependencies to the stage, making it a callback stage, and it will always be executed.I wonder if there are any cases when an
--always-changed
stage cannot be converted to an equivalent callback stage.Beta Was this translation helpful? Give feedback.
All reactions