Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC Live hangs when pushing to remote #822

Open
mkdjr opened this issue May 15, 2024 · 3 comments
Open

DVC Live hangs when pushing to remote #822

mkdjr opened this issue May 15, 2024 · 3 comments
Assignees
Labels
performance improvement over resource / time consuming tasks triage

Comments

@mkdjr
Copy link

mkdjr commented May 15, 2024

Hi there!

When calling live.end() after logging metrics and artifacts, the process of creating the DVC experiment and pushing it to the remote (I have exp.auto_push set to True) is started, but my terminal is stuck displaying the following progress bar for 20+ minutes:

image

I believe something is still printing to the terminal, as if I enter text it disappears within 5-10 seconds.

Please let me know what other information I can provide. Thanks for the awesome project.

@shcheklein
Copy link
Member

@mkdjr could please try to interrupt it and get the stack trace?

also, could you run it with export DVCLIVE_LOGLEVEL=TRACE please, see if we can get more information, thanks!

if you can try also on some super small script that would be also helpful (just to see if this is about DVCLive or there is something specific yo your environment).

@mkdjr
Copy link
Author

mkdjr commented May 16, 2024

Thanks for the response! I set the loglevel to trace and when I interrupted the command after about 60 seconds this is the error I got:

100% Adding...|██████████████████████████████████████████████████████████████████████████████████████████████████|1/1 [00:00, 21.31file/s]
^CTraceback (most recent call last):
  File "/home/mkdjr/technical-language-vs-sensed-values-alignment/run_experiment.py", line 225, in <module>
    live.end()
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvclive/live.py", line 949, in end
    self.save_dvc_exp()
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvclive/utils.py", line 182, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvclive/live.py", line 979, in save_dvc_exp
    self._experiment_rev = self._dvc_repo.experiments.save(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/__init__.py", line 359, in save
    return save(self.repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/save.py", line 36, in save
    save_result = executor.save(
                  ^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/executor/base.py", line 302, in save
    with cls.auto_push(dvc):
  File "/home/mkdjr/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/executor/base.py", line 692, in auto_push
    cls._auto_push(dvc, git_remote)
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/executor/base.py", line 713, in _auto_push
    dvc.experiments.push(
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/__init__.py", line 364, in push
    return push(self.repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/scm_context.py", line 143, in run
    return method(repo, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/push.py", line 126, in push
    result["uploaded"] = _push_cache(repo, pushed_refs_info, **kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/experiments/push.py", line 182, in _push_cache
    return repo.push(
           ^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/repo/push.py", line 98, in push
    used_run_cache = self.stage_cache.push(remote) if run_cache else []
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/stage/cache.py", line 281, in push
    return self.transfer(self.repo.cache.legacy, dest_odb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc/stage/cache.py", line 264, in transfer
    if to_fs.exists(key) and first(to_fs.find(key)):
                             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/funcy/seqs.py", line 63, in first
    return next(iter(seq), None)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 529, in find
    yield from self.fs.find(path)
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc_objects/fs/local.py", line 78, in find
    for root, _, files in self.walk(path, **kwargs):
  File "/home/mkdjr/.pyenv/versions/pytorch-env/lib/python3.11/site-packages/dvc_objects/fs/local.py", line 56, in walk
    for root, dirs, files in os.walk(
  File "<frozen os>", line 358, in _walk
KeyboardInterrupt

After waiting for no more than an hour and a half, pushing did finally finish with the following message:

100% Adding...|██████████████████████████████████████████████████████████████████████████████████████████████████|1/1 [00:00, 26.70file/s]
Collecting                                                                                                    |2.60M [02:58, 14.6kentry/s]
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
md5: a47bf585eda8e10aaa30cdc7e8d469e5.dir
md5: 942738b9064eacca5fbc81116da2bcc7.dir
md5: 6d2f4af81551a0db2fb4fe619b170a44.dir
md5: 573a5adbfdc20e7cb4f6bd00c3ff4242.dir
md5: e614a27f24b38624e91efc347d234afe.dir
md5: a6c7ce3d01e43ee3b11bc9d150be82e8.dir
md5: 830f328697740e3e8784ae050019b2fd.dir
md5: 2370c7201fb662516b962d576580eb7a.dir
md5: cb6185032f0071cbf16c65b8253c266c.dir
md5: a8765898706eda9536b4663db181e291.dir
md5: b842c74ada14da3e2fbee2dba22dca37.dir
md5: 4d0e83696cf45200524646a18c44cdca.dir
md5: 2f24f25b84879aaecd0db0a429ac41fd
md5: ce570dc2720216b7027eb87e5d66adcf.dir
md5: 5cdc0918a13fea894176803ed7677af3.dir
md5: 8a93dbebc046b5ffa0e9134fd9044ca8.dir
md5: 3672004bcc42945481f7e94a50a1b228.dir
md5: a9a3ad024dc69e2cb116eb710b20ea1c.dir
md5: fd0545d65e6bdb26b6552d5c4b76a8dc.dir
md5: e268e8cf24c4ad4a113200eb1604ef58.dir
md5: 548121f283fa53bfead3145e1e385408.dir
md5: 9aba1c782098bdb0ba6a13acbbc262dc.dir
md5: 1f7e8cbefb9eabb688fbaf642bd43876.dir
md5: af596aaa1e251b0799d74a53078edb33.dir
md5: 138cb203e2e6744e0d3d479611d3a5a5.dir
md5: a4e0c75b40834ef21392a1b7c53bb809.dir
md5: f8f6d94af6759deb65b241f9139c9902.dir
md5: 430a804f11d99c0ad85bab105fce7436.dir
md5: bb340a69a3c21502a5a6f049ff0803bc.dir
md5: 06906c90cba632f31ebd9fa3e21940b7.dir
md5: c20cad6350046bfc1df63af6c0dee3d5.dir
md5: cd6b61422f387a9ac11988c71bcd4fcf.dir
md5: d7d62aa0cb2003264562971a896cbad1.dir
md5: a2fe28d609382459eca10fd6d9627208.dir
md5: 3cdd4dd76627cf74d0f09a5012cdedc4.dir
md5: b1d1bc97d0de33a8b9ae04e23f954ab0.dir
md5: d54642c401dd41f53f4a33fc6ca891d8.dir
md5: a6d0fcb7ffeae9ac1fabd48156a1a081.dir
md5: d751713988987e9331980363e24189ce.dir
md5: 99e0012da3764659f8d3d2442fdfdf62.dir
md5: 76fcf0f391c620a3801545728e68c7b7.dir
md5: f038560ccfe371248021ea9820ceb1ff.dir
md5: 72bcfe0becd167bb7b1274f9b6ec0399.dir
md5: c8f09e42ded208823838f96c39e507d8.dir
md5: 6e8a71e9b81ecb94c56a8e637599004e.dir
md5: dd7d5a0e384dde906a03bd26611e2ad2.dir
md5: c26a713ecc0b5fb8ce88ddfecae216c2.dir
md5: 570a2a9e56e88d423c5d167bd5f49a8e
md5: af72bdd60ffe4e9286c985401a57661d.dir
md5: 430c93f785a4db98ae62ba632c898d1c.dir
md5: 01bad613e0b60fe0f4111371dc7dd5d4.dir
md5: 132c4a69581b62798c9820ffab679a97.dir
md5: c9da98a0276c74ebd31c4d4785e65166.dir
md5: 67d238bc3501f8f3a1575fd8abe4c87a.dir
md5: d6a13ece470ac76b91b8e4e9b79b774d.dir
md5: 28c5dc2686f41b0dbed51360e9d61f7c.dir
md5: 709725ba19bb5f97aedbed7d2edd8f54.dir
md5: fb13ba103951ef9603e8b71be75adffa.dir
md5: 0e90846e5c2944dd7f9f68f7fc30634a.dir
md5: 384c040285e6f2eb9f13a51d219100c2.dir
PushingWARNING:dvc_data.index.fetch:Some of the cache files do not exist neither locally nor on remote. Missing cache files:
md5: a47bf585eda8e10aaa30cdc7e8d469e5.dir
md5: 942738b9064eacca5fbc81116da2bcc7.dir
md5: 6d2f4af81551a0db2fb4fe619b170a44.dir
md5: 573a5adbfdc20e7cb4f6bd00c3ff4242.dir
md5: e614a27f24b38624e91efc347d234afe.dir
md5: a6c7ce3d01e43ee3b11bc9d150be82e8.dir
md5: 830f328697740e3e8784ae050019b2fd.dir
md5: 2370c7201fb662516b962d576580eb7a.dir
md5: cb6185032f0071cbf16c65b8253c266c.dir
md5: a8765898706eda9536b4663db181e291.dir
md5: b842c74ada14da3e2fbee2dba22dca37.dir
md5: 4d0e83696cf45200524646a18c44cdca.dir
md5: 2f24f25b84879aaecd0db0a429ac41fd
md5: ce570dc2720216b7027eb87e5d66adcf.dir
md5: 5cdc0918a13fea894176803ed7677af3.dir
md5: 8a93dbebc046b5ffa0e9134fd9044ca8.dir
md5: 3672004bcc42945481f7e94a50a1b228.dir
md5: a9a3ad024dc69e2cb116eb710b20ea1c.dir
md5: fd0545d65e6bdb26b6552d5c4b76a8dc.dir
md5: e268e8cf24c4ad4a113200eb1604ef58.dir
md5: 548121f283fa53bfead3145e1e385408.dir
md5: 9aba1c782098bdb0ba6a13acbbc262dc.dir
md5: 1f7e8cbefb9eabb688fbaf642bd43876.dir
md5: af596aaa1e251b0799d74a53078edb33.dir
md5: 138cb203e2e6744e0d3d479611d3a5a5.dir
md5: a4e0c75b40834ef21392a1b7c53bb809.dir
md5: f8f6d94af6759deb65b241f9139c9902.dir
md5: 430a804f11d99c0ad85bab105fce7436.dir
md5: bb340a69a3c21502a5a6f049ff0803bc.dir
md5: 06906c90cba632f31ebd9fa3e21940b7.dir
md5: c20cad6350046bfc1df63af6c0dee3d5.dir
md5: cd6b61422f387a9ac11988c71bcd4fcf.dir
md5: d7d62aa0cb2003264562971a896cbad1.dir
md5: a2fe28d609382459eca10fd6d9627208.dir
md5: 3cdd4dd76627cf74d0f09a5012cdedc4.dir
md5: b1d1bc97d0de33a8b9ae04e23f954ab0.dir
md5: d54642c401dd41f53f4a33fc6ca891d8.dir
md5: a6d0fcb7ffeae9ac1fabd48156a1a081.dir
md5: d751713988987e9331980363e24189ce.dir
md5: 99e0012da3764659f8d3d2442fdfdf62.dir
md5: 76fcf0f391c620a3801545728e68c7b7.dir
md5: f038560ccfe371248021ea9820ceb1ff.dir
md5: 72bcfe0becd167bb7b1274f9b6ec0399.dir
md5: c8f09e42ded208823838f96c39e507d8.dir
md5: 6e8a71e9b81ecb94c56a8e637599004e.dir
md5: dd7d5a0e384dde906a03bd26611e2ad2.dir
md5: c26a713ecc0b5fb8ce88ddfecae216c2.dir
md5: 570a2a9e56e88d423c5d167bd5f49a8e
md5: af72bdd60ffe4e9286c985401a57661d.dir
md5: 430c93f785a4db98ae62ba632c898d1c.dir
md5: 01bad613e0b60fe0f4111371dc7dd5d4.dir
md5: 132c4a69581b62798c9820ffab679a97.dir
md5: c9da98a0276c74ebd31c4d4785e65166.dir
md5: 67d238bc3501f8f3a1575fd8abe4c87a.dir
md5: d6a13ece470ac76b91b8e4e9b79b774d.dir
md5: 28c5dc2686f41b0dbed51360e9d61f7c.dir
md5: 709725ba19bb5f97aedbed7d2edd8f54.dir
md5: fb13ba103951ef9603e8b71be75adffa.dir
md5: 0e90846e5c2944dd7f9f68f7fc30634a.dir
md5: 384c040285e6f2eb9f13a51d219100c2.dir
Pushing
WARNING: The following untracked files were present in the workspace before saving but will not be included in the experiment commit:
        run_experiment_cmd.py, cache/embeddings/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100/embedding.json, cache/embeddings/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100/embedding.csv, cache/embeddings/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100/embedding.parquet, cache/corpora/Lowercase-NFD-StripAccents-Whitespace/corpus.json, cache/corpora/Lowercase-NFD-StripAccents-Whitespace/corpus.parquet, cache/corpora/Lowercase-NFD-StripAccents-Whitespace/corpus.csv, models/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100-Linear/study.db
WARNING:dvc.repo.experiments.executor.base:The following untracked files were present in the workspace before saving but will not be included in the experiment commit:
        run_experiment_cmd.py, cache/embeddings/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100/embedding.json, cache/embeddings/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100/embedding.csv, cache/embeddings/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100/embedding.parquet, cache/corpora/Lowercase-NFD-StripAccents-Whitespace/corpus.json, cache/corpora/Lowercase-NFD-StripAccents-Whitespace/corpus.parquet, cache/corpora/Lowercase-NFD-StripAccents-Whitespace/corpus.csv, models/Lowercase-NFD-StripAccents-Whitespace-all-MiniLM-L6-v2-100-Linear/study.db

@shcheklein
Copy link
Member

So, it looks like it's just pushing a lot of data to the remote storage.

Do you have DVC_EXP_AUTO_PUSH or config.auto_push enabled by chance?
Does your pipeline produce a lot of data (every time new?)

@shcheklein shcheklein self-assigned this May 19, 2024
@shcheklein shcheklein added triage performance improvement over resource / time consuming tasks labels May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance improvement over resource / time consuming tasks triage
Projects
None yet
Development

No branches or pull requests

2 participants