Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: KeyError: 'gradient_checkpointing' in Linux #553

Open
ReidTissing opened this issue Nov 10, 2024 · 3 comments
Open

[Bug]: KeyError: 'gradient_checkpointing' in Linux #553

ReidTissing opened this issue Nov 10, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ReidTissing
Copy link

What happened?

On Ubuntu, with a conda env, install goes fine, but running start-ui.sh returns: KeyError: 'gradient_checkpointing'

What did you expect would happen?

The ui to run.

Relevant log output

[OneTrainer] + /home/anaconda3/bin/conda run --prefix conda_env --no-capture-output python scripts/train_ui.py
Traceback (most recent call last):
  File "/home/OneTrainer/modules/ui/TopBar.py", line 200, in __load_current_config
    loaded_config = default_config.from_dict(loaded_dict).to_unpacked_config()
  File "/home/OneTrainer/modules/util/config/BaseConfig.py", line 70, in from_dict
    data = self.config_migrations[version](data)
  File "/home/OneTrainer/modules/util/config/TrainConfig.py", line 554, in __migration_4
    gradient_checkpointing = migrated_data.pop("gradient_checkpointing")
KeyError: 'gradient_checkpointing'

Output of pip freeze

absl-py==2.1.0
accelerate==1.0.1
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal==1.3.1
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
async-timeout==4.0.3
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.4.0
cloudpickle==3.1.0
coloredlogs==15.0.1
contourpy==1.3.0
customtkinter==5.2.2
cycler==0.12.1
dadaptation==3.2
darkdetect==0.8.0
-e git+https://github.com/huggingface/diffusers.git@e45c25d03aeb0a967d8aaa0f6a79f280f6838e1f#egg=diffusers
filelock==3.16.1
flatbuffers==24.3.25
fonttools==4.54.1
frozenlist==1.5.0
fsspec==2024.10.0
ftfy==6.3.1
grpcio==1.67.1
huggingface-hub==0.26.2
humanfriendly==10.0
idna==3.10
importlib_metadata==8.5.0
invisible-watermark==0.2.0
Jinja2==3.1.4
kiwisolver==1.4.7
lightning-utilities==0.11.8
lion-pytorch==0.2.2
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.9.2
mdurl==0.1.2
-e git+https://github.com/Nerogar/mgds.git@f9edb99bea18da54440c4600894027706b5172ce#egg=mgds
mpmath==1.3.0
multidict==6.1.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-ml-py==12.560.30
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
omegaconf==2.3.0
onnxruntime==1.19.2
open_clip_torch==2.28.0
opencv-python==4.10.0.84
packaging==24.2
pillow==11.0.0
platformdirs==4.3.6
pooch==1.8.2
prodigyopt==1.0
propcache==0.2.0
protobuf==5.28.3
psutil==6.1.0
pydantic==2.9.2
pydantic_core==2.23.4
Pygments==2.18.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
pytorch-lightning==2.4.0
pytorch_optimizer==3.1.2
PyWavelets==1.7.0
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
safetensors==0.4.5
scalene==1.5.45
schedulefree==1.2.7
scipy==1.14.1
sentencepiece==0.2.0
six==1.16.0
sympy==1.13.1
tensorboard==2.18.0
tensorboard-data-server==0.7.2
timm==1.0.11
tokenizers==0.20.3
torch==2.5.1
torchmetrics==1.5.2
torchvision==0.20.1
tqdm==4.66.6
transformers==4.46.0
triton==3.1.0
typing_extensions==4.12.2
urllib3==2.2.3
wcwidth==0.2.13
Werkzeug==3.1.3
yarl==1.17.1
zipp==3.20.2

@ReidTissing ReidTissing added the bug Something isn't working label Nov 10, 2024
@ReidTissing ReidTissing changed the title [Bug]: KeyError: 'gradient_checkpointing' in Linux [Bug]: KeyError: 'gradient_checkpointing' prevents launch in Linux Nov 10, 2024
@ReidTissing ReidTissing changed the title [Bug]: KeyError: 'gradient_checkpointing' prevents launch in Linux [Bug]: KeyError: 'gradient_checkpointing' in Linux Nov 10, 2024
@Arcitec
Copy link
Contributor

Arcitec commented Nov 18, 2024

Thanks for the report. Well the stack trace shows that it's not a Linux bug.

It's trying to load your last used config file, which is outdated, so it is putting it through migration. Which is attempting to fetch the old value of "gradient_checkpointing".

But that key does not exist in your old config file.

So this is a bug with migration of old configs.

  1. How old was your previous OneTrainer installation? If it hasn't been updated in a very long time, this might be expected. But would still be good if we can improve migration to support old configs which lack that key.
  2. Is your OneTrainer fully updated to the latest master code?
  3. If you can zip your config json directory and there's no privacy concerns with sharing it, it would be helpful to have the config files to debug what is missing in them and to help us verify that it's fixed. You can upload zips by dragging them to this comment field.

@Nerogar
Copy link
Owner

Nerogar commented Nov 18, 2024

Should be fixed now. Using .pop() with no default values was a dumb mistake. Config files can always be incomplete.

@Arcitec
Copy link
Contributor

Arcitec commented Nov 19, 2024

Yeah, I see that all the migrations have fallback values now. Looks good.

@ReidTissing Please try with the latest master code if possible and let's hear how it goes. Should work now. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants