Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF_HUB_OFFLINE environment variable not being honoured for Neuron cache #741

Open
2 of 4 tasks
unography opened this issue Nov 24, 2024 · 0 comments
Open
2 of 4 tasks
Labels
bug Something isn't working

Comments

@unography
Copy link

unography commented Nov 24, 2024

System Info

Docker Image:

763104351884.dkr.ecr.{region_name}.amazonaws.com/huggingface-pytorch-training-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04

Who can help?

@michaelbenayoun

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Running any custom model where the Neuron compilation needs to run again, the script attempts to synchronize the cache with the aws-neuron repo.
However, if HF_HUB_OFFLINE is set - this sync should not happen

File "/opt/ml/code/run_clm.py", line 719, in <module>
    main()
File "/opt/ml/code/run_clm.py", line 658, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1456, in train
    result = super().train(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1116, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 525, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1478, in evaluate
    self.synchronize_hub_cache()
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 332, in synchronize_hub_cache
    has_write_access = has_write_access_to_repo(repo_id)
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/utils/cache_utils.py", line 143, in has_write_access_to_repo
    api.delete_branch(repo_id=repo_id, repo_type="model", branch=f"this-branch-does-not-exist-{uuid4()}")
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 5775, in delete_branch
    response = get_session().delete(url=branch_url, headers=headers)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 671, in delete
    return self.request("DELETE", url, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 77, in send
    raise OfflineModeIsEnabled(
huggingface_hub.errors.OfflineModeIsEnabled: Cannot reach https://huggingface.co/api/models/aws-neuron/optimum-neuron-cache/branch/this-branch-does-not-exist-fc15be72-5665-4ec6-9921-0fab1600845b: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable.

This issue wasn't there on an earlier version, when using optimum-neuron==0.0.22

Expected behavior

No API calls to the Hub is made when HF_HUB_OFFLINE is set - huggingface_hub.errors.OfflineModeIsEnabled should be handled.
This was already being handled in an earlier version, optimum-neuron==0.0.22

@unography unography added the bug Something isn't working label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant