Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: Parameter containing #2205

Closed
Amerehei opened this issue Nov 8, 2024 · 23 comments · Fixed by huggingface/transformers#35212
Closed

KeyError: Parameter containing #2205

Amerehei opened this issue Nov 8, 2024 · 23 comments · Fixed by huggingface/transformers#35212

Comments

@Amerehei
Copy link

Amerehei commented Nov 8, 2024

I want to run sft example and I get some erros, Can you help me to find the problem?

I run run_peft_fsdp.sh with --model_name_or_path "meta-llama/Llama-2-7b-hf" (I used smaller model, just for test purpose)

I use pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04. Here are my environments details and erros

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Packages
Name: transformers
Version: 4.47.0.dev0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, trl, unsloth_zoo
---
Name: accelerate
Version: 1.1.0.dev0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: peft, trl, unsloth_zoo
---
Name: peft
Version: 0.13.3.dev0
Summary: Parameter-Efficient Fine-Tuning (PEFT)
Home-page: https://github.com/huggingface/peft
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires: accelerate, huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch, tqdm, transformers
Required-by: unsloth_zoo
---
Name: trl
Version: 0.13.0.dev0
Summary: Train transformer language models with reinforcement learning.
Home-page: https://github.com/huggingface/trl
Author: Leandro von Werra
Author-email: [email protected]
License: Apache 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: accelerate, datasets, rich, transformers
Required-by: unsloth_zoo
---
Name: datatrove
Version: 0.3.0
Summary: HuggingFace library to process and filter large amounts of webdata
Home-page:
Author:
Author-email: "HuggingFace Inc." 
License: Apache-2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: dill, fsspec, huggingface-hub, humanize, loguru, multiprocess, numpy, tqdm
Required-by:
---
Name: unsloth
Version: 2024.11.5
Summary: 2-5X faster LLM finetuning
Home-page: http://www.unsloth.ai
Author: Unsloth AI team
Author-email: [email protected]
---
Name: deepspeed
Version: 0.15.3
Summary: DeepSpeed library
Home-page: http://deepspeed.ai
Author: DeepSpeed Team
Author-email: [email protected]
License: Apache Software License 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: hjson, msgpack, ninja, numpy, nvidia-ml-py, packaging, psutil, py-cpuinfo, pydantic, torch, tqdm
Required-by:
---
Name: PyGithub
Version: 2.5.0
Summary: Use the full Github API v3
Home-page:
Author:
Author-email: Vincent Jacques 
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires: Deprecated, pyjwt, pynacl, requests, typing-extensions, urllib3
Required-by:
---
Name: flash-attn
Version: 2.6.3
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: [email protected]
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires: einops, torch
Required-by:
---
Name: huggingface-hub
Version: 0.26.2
Summary: Client library to download and publish models, datasets and other repos on the huggingface.co hub
Home-page: https://github.com/huggingface/huggingface_hub
Author: Hugging Face, Inc.
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, fsspec, packaging, pyyaml, requests, tqdm, typing-extensions
Required-by: accelerate, datasets, datatrove, evaluate, peft, tokenizers, transformers, unsloth_zoo
---
Name: evaluate
Version: 0.4.3
Summary: HuggingFace community-driven open-source library of evaluation
Home-page: https://github.com/huggingface/evaluate
Author: HuggingFace Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: datasets, dill, fsspec, huggingface-hub, multiprocess, numpy, packaging, pandas, requests, tqdm, xxhash
Required-by:
---
Name: datasets
Version: 3.1.0
Summary: HuggingFace community-driven open-source library of datasets
Home-page: https://github.com/huggingface/datasets
Author: HuggingFace Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: aiohttp, dill, filelock, fsspec, huggingface-hub, multiprocess, numpy, packaging, pandas, pyarrow, pyyaml, requests, tqdm, xxhash
Required-by: evaluate, trl, unsloth_zoo
---
Name: bitsandbytes
Version: 0.44.1
Summary: k-bit optimizers and matrix multiplication routines.
Home-page: https://github.com/TimDettmers/bitsandbytes
Author: Tim Dettmers
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires: numpy, torch
Required-by:
---
Name: einops
Version: 0.8.0
Summary: A new flavour of deep learning operations
Home-page: https://github.com/arogozhnikov/einops
Author: Alex Rogozhnikov
Author-email:
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires:
Required-by: flash-attn
---
Name: wandb
Version: 0.18.6
Summary: A CLI and library for interacting with the Weights & Biases API.
Home-page:
Author:
Author-email: Weights & Biases 
---
Name: pandas
Version: 2.2.3
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author:
Author-email: The Pandas Development Team 
---
Name: numpy
Version: 1.26.3
Summary: Fundamental package for array computing in Python
Home-page: https://numpy.org
Author: Travis E. Oliphant et al.
Required-by: accelerate, bitsandbytes, contourpy, datasets, datatrove, deepspeed, evaluate, matplotlib, pandas, peft, scikit-learn, scipy, tensorboard, torchvision, transformers, unsloth_zoo, xformers
---
Name: scipy
Version: 1.14.1
Summary: Fundamental algorithms for scientific computing in Python
Home-page: https://scipy.org/
Author:
Author-email:
---
Name: sentencepiece
Version: 0.2.0
Summary: SentencePiece python wrapper
Home-page: https://github.com/google/sentencepiece
Author: Taku Kudo
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires:
Required-by: unsloth_zoo
---
Name: nltk
Version: 3.9.1
Summary: Natural Language Toolkit
Home-page: https://www.nltk.org/
Author: NLTK Team
Author-email: [email protected]
License: Apache License, Version 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: click, joblib, regex, tqdm
Required-by:
---
Name: xformers
Version: 0.0.28.post3
Summary: XFormers: A collection of composable Transformer building blocks.
Home-page: https://facebookresearch.github.io/xformers/
Author: Facebook AI Research
Author-email: [email protected]
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires: numpy, torch
Required-by:
---
Name: hf_transfer
Version: 0.1.8
Summary: Speed up file transfers with the Hugging Face Hub.
Home-page:
Author:
Author-email:
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires:
Required-by: unsloth_zoo
---
Name: scikit-learn
Version: 1.5.2
Summary: A set of python modules for machine learning and data mining
Home-page: https://scikit-learn.org
Author:
Author-email:
License: BSD 3-Clause License
@Amerehei
Copy link
Author

Amerehei commented Nov 8, 2024

Log
 accelerate launch --config_file "configs/fsdp_config.yaml"  train.py \
--seed 100 \                                                                                       --seed 100    --model_name_or_path "meta-llama/Llama-2-7b-hf"    --dataset_name "smangrul/ultrachat-10k-chatml"    --chat_template_format "chatml"    --add_special_tokens False    --append_concat_token False    --splits "train,test"    --max_seq_len 2048    --num_train_epochs 1    --logging_steps 5    --log_level "info"    --logging_strategy "steps"    --eval_strategy "epoch"    --save_strategy "epoch"    --push_to_hub    --hub_private_repo True    --hub_strategy "every_save"    --bf16 True    --packing True    --learning_rate 1e-4    --lr_scheduler_type "cosine"    --weight_decay 1e-4    --warmup_ratio 0.0    --max_grad_norm 1.0    --output_dir "mistral-sft-lora-fsdp"    --per_device_train_batch_size 8    --per_device_eval_batch_size 8    --gradient_accumulation_steps 4    --gradient_checkpointing True    --use_reentrant False    --dataset_text_field "content"    --use_flash_attn True    --use_peft_lora True    --lora_r 8    --lora_alpha 16    --lora_dropout 0.1    --lora_target_modules "all-linear"    --use_4bit_quantization False
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 609/609 [00:00<00:00, 1.96MB/s]
model.safetensors.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 69.7MB/s]
model-00001-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [03:51<00:00, 43.0MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:17<00:00, 202MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.72s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 105.78s/it]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.73s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.73s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.73s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.50it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.44it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.30it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.37it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.47it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.33it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.29it/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 188/188 [00:00<00:00, 1.53MB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 776/776 [00:00<00:00, 7.62MB/s]
tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 38.8MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 16.9MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.49MB/s]
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Loading checkpoint shards:  50%|██████████████████████████████████████████████████████████████████████▌                                                                      | 1/2 [00:07<00:07,  7.58s/it]The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.09s/it]
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
README.md: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524/524 [00:00<00:00, 4.88MB/s]
train-00000-of-00001.parquet: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35.2M/35.2M [00:00<00:00, 42.5MB/s]
test-00000-of-00001.parquet: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.08M/7.08M [00:00<00:00, 42.3MB/s]
Generating train split: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 44326.80 examples/s]
Generating test split: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 49700.55 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 761.49 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 789.50 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2276.91 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2460.44 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1008.03 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
[rank3]:[W1108 16:43:16.179385976 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank5]:[W1108 16:43:16.198058004 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 5]  using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank2]:[W1108 16:43:16.203509811 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank6]:[W1108 16:43:16.216316855 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 6]  using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank1]:[W1108 16:43:16.240819550 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank7]:[W1108 16:43:16.249535497 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 7]  using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
[rank4]:[W1108 16:43:20.883546315 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 4]  using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Generating train split: 8 examples [00:00, 291.34 examples/s]
Generating train split: 8 examples [00:00, 541.47 examples/s]
[rank0]:[W1108 16:43:55.752157448 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[2024-11-08 16:43:58,359] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /root/.triton/autotune: No such file or directory
[2024-11-08 16:43:58,441] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,446] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,447] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,516] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,578] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,629] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,689] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Using auto half precision backend
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
PeftModelForCausalLM(
(base_model): LoraModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32008, 4096)
(layers): ModuleList(
(0-31): 32 x LlamaDecoderLayer(
(self_attn): LlamaFlashAttention2(
(q_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(k_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(v_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(o_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(up_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(down_proj): lora.Linear(
(base_layer): Linear(in_features=11008, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=11008, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
)
)
(norm): LlamaRMSNorm((4096,), eps=1e-05)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=4096, out_features=32008, bias=False)
)
)
)
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
***** Running training *****
Num examples = 8
Num Epochs = 1
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 256
Gradient Accumulation steps = 4
Total optimization steps = 1
Number of trainable parameters = 2,498,560
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: Tracking run with wandb version 0.18.6
wandb: Run data is saved locally in /workspace/wandb/run-20241108_164934-d2cvs1zs
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run mistral-sft-lora-fsdp
wandb: ⭐️ View project at https://wandb.ai/a-amerehi/huggingface
wandb: 🚀 View run at https://wandb.ai/a-amerehi/huggingface/runs/d2cvs1zs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.91s/it]/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
warnings.warn(
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/config.json
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
warnings.warn(
Model config LlamaConfig {
"_name_or_path": "meta-llama/Llama-2-7b-hf",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.47.0.dev0",
"use_cache": true,
"vocab_size": 32000
}

/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.
dist_cp.save_state_dict(
[rank6]: Traceback (most recent call last):
[rank6]: File "/workspace/train.py", line 155, in
[rank6]: main(model_args, data_args, training_args)
[rank6]: File "/workspace/train.py", line 139, in main
[rank6]: trainer.train(resume_from_checkpoint=checkpoint)
[rank6]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank6]: return inner_training_loop(
[rank6]: ^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank6]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank6]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank6]: self._save_checkpoint(model, trial)
[rank6]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank6]: self._save_optimizer_and_scheduler(output_dir)
[rank6]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank6]: save_fsdp_optimizer(
[rank6]: File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank6]: optim_state = FSDP.optim_state_dict(model, optimizer)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank6]: return FullyShardedDataParallel._optim_state_dict_impl(
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank6]: return _optim_state_dict(
[rank6]: ^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank6]: return func(*args, **kwargs)
[rank6]: ^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank6]: fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank6]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank6]: nested_unflat_param_names = [
[rank6]: ^
[rank6]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
[rank6]: param_to_fqns[param] for param in param_group_params
[rank6]: ~~~~~~~~~~~~~^^^^^^^
[rank6]: KeyError: Parameter containing:
[rank6]: tensor([[ 0.0007, -0.0035, -0.0132, ..., 0.0048, 0.0075, -0.0131],
[rank6]: [-0.0077, 0.0071, 0.0069, ..., 0.0037, 0.0114, -0.0142],
[rank6]: [-0.0058, 0.0103, -0.0030, ..., -0.0134, 0.0156, 0.0019],
[rank6]: ...,
[rank6]: [ 0.0084, 0.0016, -0.0019, ..., -0.0135, -0.0142, -0.0084],
[rank6]: [-0.0133, -0.0083, 0.0022, ..., -0.0101, 0.0025, -0.0026],
[rank6]: [ 0.0148, -0.0037, 0.0084, ..., -0.0073, -0.0091, 0.0124]],
[rank6]: device='cuda:6', requires_grad=True)
[rank3]: Traceback (most recent call last):
[rank3]: File "/workspace/train.py", line 155, in
[rank3]: main(model_args, data_args, training_args)
[rank3]: File "/workspace/train.py", line 139, in main
[rank3]: trainer.train(resume_from_checkpoint=checkpoint)
[rank3]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank3]: return inner_training_loop(
[rank3]: ^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank3]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank3]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank3]: self._save_checkpoint(model, trial)
[rank3]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank3]: self._save_optimizer_and_scheduler(output_dir)
[rank3]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank3]: save_fsdp_optimizer(
[rank3]: File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank3]: optim_state = FSDP.optim_state_dict(model, optimizer)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank3]: return FullyShardedDataParallel._optim_state_dict_impl(
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank3]: return _optim_state_dict(
[rank3]: ^^^^^^^^^^^^^^^^^^
[rank3]: File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank3]: return func(*args, **kwargs)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank3]: fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank3]: nested_unflat_param_names = [
[rank3]: ^
[rank3]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
[rank3]: param_to_fqns[param] for param in param_group_params
[rank3]: ~~~~~~~~~~~~~^^^^^^^
[rank3]: KeyError: Parameter containing:
[rank3]: tensor([[ 0.0007, -0.0035, -0.0132, ..., 0.0048, 0.0075, -0.0131],
[rank3]: [-0.0077, 0.0071, 0.0069, ..., 0.0037, 0.0114, -0.0142],
[rank3]: [-0.0058, 0.0103, -0.0030, ..., -0.0134, 0.0156, 0.0019],
[rank3]: ...,
[rank3]: [ 0.0084, 0.0016, -0.0019, ..., -0.0135, -0.0142, -0.0084],
[rank3]: [-0.0133, -0.0083, 0.0022, ..., -0.0101, 0.0025, -0.0026],
[rank3]: [ 0.0148, -0.0037, 0.0084, ..., -0.0073, -0.0091, 0.0124]],
[rank3]: device='cuda:3', requires_grad=True)
[rank2]: Traceback (most recent call last):
[rank2]: File "/workspace/train.py", line 155, in
[rank2]: main(model_args, data_args, training_args)
[rank2]: File "/workspace/train.py", line 139, in main
[rank2]: trainer.train(resume_from_checkpoint=checkpoint)
[rank2]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank2]: return inner_training_loop(
[rank2]: ^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank2]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank2]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank2]: self._save_checkpoint(model, trial)
[rank2]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank2]: self._save_optimizer_and_scheduler(output_dir)
[rank2]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank2]: save_fsdp_optimizer(
[rank2]: File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank2]: optim_state = FSDP.optim_state_dict(model, optimizer)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank2]: return FullyShardedDataParallel._optim_state_dict_impl(
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank2]: return _optim_state_dict(
[rank2]: ^^^^^^^^^^^^^^^^^^
[rank2]: File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank2]: return func(*args, **kwargs)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank2]: fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank2]: nested_unflat_param_names = [
[rank2]: ^
[rank2]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
[rank2]: param_to_fqns[param] for param in param_group_params
[rank2]: ~~~~~~~~~~~~~^^^^^^^
[rank2]: KeyError: Parameter containing:
[rank2]: tensor([[ 0.0007, -0.0035, -0.0132, ..., 0.0048, 0.0075, -0.0131],
[rank2]: [-0.0077, 0.0071, 0.0069, ..., 0.0037, 0.0114, -0.0142],
[rank2]: [-0.0058, 0.0103, -0.0030, ..., -0.0134, 0.0156, 0.0019],
[rank2]: ...,
[rank2]: [ 0.0084, 0.0016, -0.0019, ..., -0.0135, -0.0142, -0.0084],
[rank2]: [-0.0133, -0.0083, 0.0022, ..., -0.0101, 0.0025, -0.0026],
[rank2]: [ 0.0148, -0.0037, 0.0084, ..., -0.0073, -0.0091, 0.0124]],
[rank2]: device='cuda:2', requires_grad=True)
[rank4]: Traceback (most recent call last):
[rank4]: File "/workspace/train.py", line 155, in
[rank4]: main(model_args, data_args, training_args)
[rank4]: File "/workspace/train.py", line 139, in main
[rank4]: trainer.train(resume_from_checkpoint=checkpoint)
[rank4]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank4]: return inner_training_loop(
[rank4]: ^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank4]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank4]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank4]: self._save_checkpoint(model, trial)
[rank4]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank4]: self._save_optimizer_and_scheduler(output_dir)
[rank4]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank4]: save_fsdp_optimizer(
[rank4]: File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank4]: optim_state = FSDP.optim_state_dict(model, optimizer)
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank4]: return FullyShardedDataParallel._optim_state_dict_impl(
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank4]: return _optim_state_dict(
[rank4]: ^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank4]: return func(*args, **kwargs)
[rank4]: ^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank4]: fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank4]: nested_unflat_param_names = [
[rank4]: ^
[rank4]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
[rank4]: param_to_fqns[param] for param in param_group_params
[rank4]: ~~~~~~~~~~~~~^^^^^^^
[rank4]: KeyError: Parameter containing:
[rank4]: tensor([[ 0.0007, -0.0035, -0.0132, ..., 0.0048, 0.0075, -0.0131],
[rank4]: [-0.0077, 0.0071, 0.0069, ..., 0.0037, 0.0114, -0.0142],
[rank4]: [-0.0058, 0.0103, -0.0030, ..., -0.0134, 0.0156, 0.0019],
[rank4]: ...,
[rank4]: [ 0.0084, 0.0016, -0.0019, ..., -0.0135, -0.0142, -0.0084],
[rank4]: [-0.0133, -0.0083, 0.0022, ..., -0.0101, 0.0025, -0.0026],
[rank4]: [ 0.0148, -0.0037, 0.0084, ..., -0.0073, -0.0091, 0.0124]],
[rank4]: device='cuda:4', requires_grad=True)
[rank7]: Traceback (most recent call last):
[rank7]: File "/workspace/train.py", line 155, in
[rank7]: main(model_args, data_args, training_args)
[rank7]: File "/workspace/train.py", line 139, in main
[rank7]: trainer.train(resume_from_checkpoint=checkpoint)
[rank7]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank7]: return inner_training_loop(
[rank7]: ^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank7]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank7]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank7]: self._save_checkpoint(model, trial)
[rank7]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank7]: self._save_optimizer_and_scheduler(output_dir)
[rank7]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank7]: save_fsdp_optimizer(
[rank7]: File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank7]: optim_state = FSDP.optim_state_dict(model, optimizer)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank7]: return FullyShardedDataParallel._optim_state_dict_impl(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank7]: return _optim_state_dict(
[rank7]: ^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank7]: return func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank7]: fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank7]: nested_unflat_param_names = [
[rank7]: ^
[rank7]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
[rank7]: param_to_fqns[param] for param in param_group_params
[rank7]: ~~~~~~~~~~~~~^^^^^^^
[rank7]: KeyError: Parameter containing:
[rank7]: tensor([[ 0.0007, -0.0035, -0.0132, ..., 0.0048, 0.0075, -0.0131],
[rank7]: [-0.0077, 0.0071, 0.0069, ..., 0.0037, 0.0114, -0.0142],
[rank7]: [-0.0058, 0.0103, -0.0030, ..., -0.0134, 0.0156, 0.0019],
[rank7]: ...,
[rank7]: [ 0.0084, 0.0016, -0.0019, ..., -0.0135, -0.0142, -0.0084],
[rank7]: [-0.0133, -0.0083, 0.0022, ..., -0.0101, 0.0025, -0.0026],
[rank7]: [ 0.0148, -0.0037, 0.0084, ..., -0.0073, -0.0091, 0.0124]],
[rank7]: device='cuda:7', requires_grad=True)
[rank5]: Traceback (most recent call last):
[rank5]: File "/workspace/train.py", line 155, in
[rank5]: main(model_args, data_args, training_args)
[rank5]: File "/workspace/train.py", line 139, in main
[rank5]: trainer.train(resume_from_checkpoint=checkpoint)
[rank5]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank5]: return inner_training_loop(
[rank5]: ^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank5]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank5]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank5]: self._save_checkpoint(model, trial)
[rank5]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank5]: self._save_optimizer_and_scheduler(output_dir)
[rank5]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank5]: save_fsdp_optimizer(
[rank5]: File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank5]: optim_state = FSDP.optim_state_dict(model, optimizer)
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank5]: return FullyShardedDataParallel._optim_state_dict_impl(
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank5]: return _optim_state_dict(
[rank5]: ^^^^^^^^^^^^^^^^^^
[rank5]: File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank5]: return func(*args, **kwargs)
[rank5]: ^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank5]: fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank5]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank5]: nested_unflat_param_names = [
[rank5]: ^
[rank5]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
[rank5]: param_to_fqns[param] for param in param_group_params
[rank5]: ~~~~~~~~~~~~~^^^^^^^
[rank5]: KeyError: Parameter containing:
[rank5]: tensor([[ 0.0007, -0.0035, -0.0132, ..., 0.0048, 0.0075, -0.0131],
[rank5]: [-0.0077, 0.0071, 0.0069, ..., 0.0037, 0.0114, -0.0142],
[rank5]: [-0.0058, 0.0103, -0.0030, ..., -0.0134, 0.0156, 0.0019],
[rank5]: ...,
[rank5]: [ 0.0084, 0.0016, -0.0019, ..., -0.0135, -0.0142, -0.0084],
[rank5]: [-0.0133, -0.0083, 0.0022, ..., -0.0101, 0.0025, -0.0026],
[rank5]: [ 0.0148, -0.0037, 0.0084, ..., -0.0073, -0.0091, 0.0124]],
[rank5]: device='cuda:5', requires_grad=True)
Traceback (most recent call last):
File "/workspace/train.py", line 155, in
main(model_args, data_args, training_args)
File "/workspace/train.py", line 139, in main
trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial)
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
self._save_optimizer_and_scheduler(output_dir)
[rank1]: Traceback (most recent call last):
[rank1]: File "/workspace/train.py", line 155, in
[rank1]: main(model_args, data_args, training_args)
[rank1]: File "/workspace/train.py", line 139, in main
[rank1]: trainer.train(resume_from_checkpoint=checkpoint)
[rank1]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank1]: return inner_training_loop(
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank1]: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank1]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank1]: self._save_checkpoint(model, trial)
[rank1]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank1]: self._save_optimizer_and_scheduler(output_dir)
[rank1]: File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank1]: save_fsdp_optimizer(
[rank1]: File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank1]: optim_state = FSDP.optim_state_dict(model, optimizer)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank1]: return FullyShardedDataParallel._optim_state_dict_impl(
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank1]: return _optim_state_dict(
[rank1]: ^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank1]: fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank1]: nested_unflat_param_names = [
[rank1]: ^
[rank1]: File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
[rank1]: param_to_fqns[param] for param in param_group_params
[rank1]: ~~~~~~~~~~~~~^^^^^^^
[rank1]: KeyError: Parameter containing:
[rank1]: tensor([[ 0.0007, -0.0035, -0.0132, ..., 0.0048, 0.0075, -0.0131],
[rank1]: [-0.0077, 0.0071, 0.0069, ..., 0.0037, 0.0114, -0.0142],
[rank1]: [-0.0058, 0.0103, -0.0030, ..., -0.0134, 0.0156, 0.0019],
[rank1]: ...,
[rank1]: [ 0.0084, 0.0016, -0.0019, ..., -0.0135, -0.0142, -0.0084],
[rank1]: [-0.0133, -0.0083, 0.0022, ..., -0.0101, 0.0025, -0.0026],
[rank1]: [ 0.0148, -0.0037, 0.0084, ..., -0.0073, -0.0091, 0.0124]],
[rank1]: device='cuda:1', requires_grad=True)
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
save_fsdp_optimizer(
File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
optim_state = FSDP.optim_state_dict(model, optimizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
return FullyShardedDataParallel._optim_state_dict_impl(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
return _optim_state_dict(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
fsdp_osd["param_groups"] = _unflatten_param_groups(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
nested_unflat_param_names = [
^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in
param_to_fqns[param] for param in param_group_params

KeyError: Parameter containing:
tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],
[-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],
[-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],
...,
[ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],
[-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],
[ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],
device='cuda:0', requires_grad=True)
[rank0]: Traceback (most recent call last):
[rank0]:   File "/workspace/train.py", line 155, in <module>
[rank0]:     main(model_args, data_args, training_args)
[rank0]:   File "/workspace/train.py", line 139, in main
[rank0]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank0]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank0]:     self._save_checkpoint(model, trial)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank0]:     self._save_optimizer_and_scheduler(output_dir)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank0]:     save_fsdp_optimizer(
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank0]:     optim_state = FSDP.optim_state_dict(model, optimizer)
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank0]:     return FullyShardedDataParallel._optim_state_dict_impl(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank0]:     return _optim_state_dict(
[rank0]:            ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank0]:     fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank0]:                                ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank0]:     nested_unflat_param_names = [
[rank0]:                                 ^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in <listcomp>
[rank0]:     param_to_fqns[param] for param in param_group_params
[rank0]:     ~~~~~~~~~~~~~^^^^^^^
[rank0]: KeyError: Parameter containing:
[rank0]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],
[rank0]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],
[rank0]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],
[rank0]:         ...,
[rank0]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],
[rank0]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],
[rank0]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],
[rank0]:        device='cuda:0', requires_grad=True)
wandb: 🚀 View run mistral-sft-lora-fsdp at: https://wandb.ai/a-amerehi/huggingface/runs/d2cvs1zs
wandb: Find logs at: wandb/run-20241108_164934-d2cvs1zs/logs
W1108 16:49:59.928000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2243 closing signal SIGTERM
W1108 16:49:59.930000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2244 closing signal SIGTERM
W1108 16:49:59.930000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2245 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2247 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2248 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2249 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2250 closing signal SIGTERM
E1108 16:50:01.148000 2163 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 3 (pid: 2246) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1155, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 793, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time      : 2024-11-08_16:49:59
host      : e3997253d925
rank      : 3 (local_rank: 3)
exitcode  : 1 (pid: 2246)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
   </pre>
</details>


@Amerehei
Copy link
Author

Amerehei commented Nov 8, 2024

I have one more question, why are there many warnings the log, specially deprecated warnings

@Amerehei
Copy link
Author

@BenjaminBossan @qgallouedec
Any idea?

@BenjaminBossan
Copy link
Member

Sorry for the delay in replying @Amerehei we're currently at a company offsite. Hopefully at the start of the next week, I'll have the opportunity to try to reproduce and will report back.

@Amerehei
Copy link
Author

Thanks Benjamin for your response

@BenjaminBossan
Copy link
Member

BenjaminBossan commented Nov 18, 2024

I finally got around to testing this. I tried to stick closely to your settings but used only 2 GPUs and reduced some numbers like batch size for memory. Regarding the packages, I use trl 0.12.1 and torch 2.5.1. At first, training seemed to run fine. But when I changed the save_strategy to every 3 steps to trigger a checkpoint more quickly, I got the same error as you. So I assume that for you, the training itself also works, it's just that the model checkpoint is failing.

As a next step, I switched to a much smaller model (opt-125m) and tried full fine-tuning to check if the error is PEFT-related. Interestingly, I got the same type of error (KeyError: Parameter containing: ...). This makes it likely that the issue is not directly PEFT-related. It could instead be an error in the train.py script or an error with the SFTTrainer or accelerate. I tried an older trl version (0.10.1) but still the same error. Downgrading accelerate resulted in other errors. Searching for the error message, I didn't find much at all.

All this leaves me a bit puzzled. Tentatively pinging @muellerzr in case he has come across this error or knows someone else who might have.

PS: Also tried fsdp_use_orig_params: true but no luck.

@vrancurel
Copy link

I have the same issue while saving a fine tuned model with QLoRA.

@BenjaminBossan
Copy link
Member

Thanks for the additional feedback. I did some more testing and I could get the checkpoint to work by downgrading to the following packages:

  • trl==0.11.0
  • tokenizers>=0.19,<0.20
  • transformers==4.44.2
  • accelerate==0.33.0

Note that those are most likely not the exact maximal versions, but it's very hard to figure those out as I had to change all 4 of them together, as there are mutual dependiencies.

@vrancurel @Amerehei It would be great if you could test this out and report back if those versions solve the issue for you too. If that's the case, it confirms my suspicion that the error is not PEFT related.

@Amerehei
Copy link
Author

@BenjaminBossan I'm not sure if I did it right, but I have different problem

I ran the following command to downgrade libraries

pip install trl==0.11.0 "tokenizers>=0.19,<0.20" transformers==4.44.2 accelerate==0.33.0

after running the model I have

Log
Running command: accelerate launch --config_file configs/fsdp_config.yaml  train.py       --seed 100       --model_name_or_path meta-llama/Llama-2-7b-hf       --dataset_name smangrul/ultrachat-10k-chatml       --chat_template_format chatml       --add_special_tokens False       --append_concat_token False       --splits train,test       --max_seq_len 2048       --num_train_epochs 1       --logging_steps 5       --log_level info       --logging_strategy steps       --eval_strategy epoch       --save_strategy epoch       --push_to_hub       --hub_private_repo True       --hub_strategy every_save       --bf16 True       --packing True       --learning_rate 1e-4       --lr_scheduler_type cosine       --weight_decay 1e-4       --warmup_ratio 0.0       --max_grad_norm 1.0       --output_dir mistral-sft-lora-fsdp       --per_device_train_batch_size 8       --per_device_eval_batch_size 8       --gradient_accumulation_steps 4       --gradient_checkpointing True       --use_reentrant False       --dataset_text_field content       --use_flash_attn True       --use_peft_lora True       --lora_r 8       --lora_alpha 16       --lora_dropout 0.1       --lora_target_modules all-linear       --use_4bit_quantization False
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 609/609 [00:00<00:00, 2.50MB/s]
[rank3]: Traceback (most recent call last):
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
[rank3]:     return importlib.import_module("." + module_name, self.__name__)
[rank3]:   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
[rank3]:     return _bootstrap._gcd_import(name[level:], package, level)
[rank3]:   File "", line 1050, in _gcd_import
[rank3]:   File "", line 1027, in _find_and_load
[rank3]:   File "", line 1006, in _find_and_load_unlocked
[rank3]:   File "", line 688, in _load_unlocked
[rank3]:   File "", line 883, in exec_module
[rank3]:   File "", line 241, in _call_with_frames_removed
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in 
[rank3]:     from ...modeling_flash_attention_utils import _flash_attention_forward
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in 
[rank3]:     from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in 
[rank3]:     from flash_attn.flash_attn_interface import (
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in 
[rank3]:     import flash_attn_2_cuda as flash_attn_cuda
[rank3]: ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

[rank3]: The above exception was the direct cause of the following exception:

[rank3]: Traceback (most recent call last):
[rank3]: File "/workspace/train.py", line 155, in
[rank3]: main(model_args, data_args, training_args)
[rank3]: File "/workspace/train.py", line 101, in main
[rank3]: model, peft_config, tokenizer = create_and_prepare_model(model_args, data_args, training_args)
[rank3]: File "/workspace/utils.py", line 141, in create_and_prepare_model
[rank3]: model = AutoModelForCausalLM.from_pretrained(
[rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
[rank3]: model_class = _get_model_class(config, cls._model_mapping)
[rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 384, in _get_model_class
[rank3]: supported_models = model_mapping[type(config)]
[rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 735, in getitem
[rank3]: return self._load_attr_from_module(model_type, model_name)
[rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 749, in _load_attr_from_module
[rank3]: return getattribute_from_module(self._modules[module_name], attr)
[rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 693, in getattribute_from_module
[rank3]: if hasattr(module, attr):
[rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in getattr
[rank3]: module = self._get_module(self._class_to_module[name])
[rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module
[rank3]: raise RuntimeError(
[rank3]: RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
[rank3]: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
[rank0]: return importlib.import_module("." + module_name, self.name)
[rank0]: File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
[rank0]: return _bootstrap._gcd_import(name[level:], package, level)
[rank0]: File "", line 1050, in _gcd_import
[rank0]: File "", line 1027, in _find_and_load
[rank0]: File "", line 1006, in _find_and_load_unlocked
[rank0]: File "", line 688, in _load_unlocked
[rank0]: File "", line 883, in exec_module
[rank0]: File "", line 241, in _call_with_frames_removed
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in
[rank0]: from ...modeling_flash_attention_utils import _flash_attention_forward
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in
[rank0]: from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
[rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in
[rank0]: from flash_attn.flash_attn_interface import (
[rank0]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in
[rank0]: import flash_attn_2_cuda as flash_attn_cuda
[rank0]: ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/train.py", line 155, in
[rank0]: main(model_args, data_args, training_args)
[rank0]: File "/workspace/train.py", line 101, in main
[rank0]: model, peft_config, tokenizer = create_and_prepare_model(model_args, data_args, training_args)
[rank0]: File "/workspace/utils.py", line 141, in create_and_prepare_model
[rank0]: model = AutoModelForCausalLM.from_pretrained(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
[rank0]: model_class = _get_model_class(config, cls._model_mapping)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 384, in _get_model_class
[rank0]: supported_models = model_mapping[type(config)]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 735, in getitem
[rank0]: return self._load_attr_from_module(model_type, model_name)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 749, in _load_attr_from_module
[rank0]: return getattribute_from_module(self._modules[module_name], attr)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 693, in getattribute_from_module
[rank0]: if hasattr(module, attr):
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in getattr
[rank0]: module = self._get_module(self._class_to_module[name])
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module
[rank0]: raise RuntimeError(
[rank0]: RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
[rank0]: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
[rank2]: Traceback (most recent call last):
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
[rank2]: return importlib.import_module("." + module_name, self.name)
[rank2]: File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
[rank2]: return _bootstrap._gcd_import(name[level:], package, level)
[rank2]: File "", line 1050, in _gcd_import
[rank2]: File "", line 1027, in _find_and_load
[rank2]: File "", line 1006, in _find_and_load_unlocked
[rank2]: File "", line 688, in _load_unlocked
[rank2]: File "", line 883, in exec_module
[rank2]: File "", line 241, in _call_with_frames_removed
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in
[rank2]: from ...modeling_flash_attention_utils import _flash_attention_forward
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in
[rank2]: from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
[rank2]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in
[rank2]: from flash_attn.flash_attn_interface import (
[rank2]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in
[rank2]: import flash_attn_2_cuda as flash_attn_cuda
[rank2]: ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

[rank2]: The above exception was the direct cause of the following exception:

[rank2]: Traceback (most recent call last):
[rank2]: File "/workspace/train.py", line 155, in
[rank2]: main(model_args, data_args, training_args)
[rank2]: File "/workspace/train.py", line 101, in main
[rank2]: model, peft_config, tokenizer = create_and_prepare_model(model_args, data_args, training_args)
[rank2]: File "/workspace/utils.py", line 141, in create_and_prepare_model
[rank2]: model = AutoModelForCausalLM.from_pretrained(
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
[rank2]: model_class = _get_model_class(config, cls._model_mapping)
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 384, in _get_model_class
[rank2]: supported_models = model_mapping[type(config)]
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 735, in getitem
[rank2]: return self._load_attr_from_module(model_type, model_name)
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 749, in _load_attr_from_module
[rank2]: return getattribute_from_module(self._modules[module_name], attr)
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 693, in getattribute_from_module
[rank2]: if hasattr(module, attr):
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in getattr
[rank2]: module = self._get_module(self._class_to_module[name])
[rank2]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module
[rank2]: raise RuntimeError(
[rank2]: RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
[rank2]: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
[rank1]: Traceback (most recent call last):
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
[rank1]: return importlib.import_module("." + module_name, self.name)
[rank1]: File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
[rank1]: return _bootstrap._gcd_import(name[level:], package, level)
[rank1]: File "", line 1050, in _gcd_import
[rank1]: File "", line 1027, in _find_and_load
[rank1]: File "", line 1006, in _find_and_load_unlocked
[rank1]: File "", line 688, in _load_unlocked
[rank1]: File "", line 883, in exec_module
[rank1]: File "", line 241, in _call_with_frames_removed
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in
[rank1]: from ...modeling_flash_attention_utils import _flash_attention_forward
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in
[rank1]: from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
[rank1]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in
[rank1]: from flash_attn.flash_attn_interface import (
[rank1]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in
[rank1]: import flash_attn_2_cuda as flash_attn_cuda
[rank1]: ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

[rank1]: The above exception was the direct cause of the following exception:

[rank1]: Traceback (most recent call last):
[rank1]: File "/workspace/train.py", line 155, in
[rank1]: main(model_args, data_args, training_args)
[rank1]: File "/workspace/train.py", line 101, in main
[rank1]: model, peft_config, tokenizer = create_and_prepare_model(model_args, data_args, training_args)
[rank1]: File "/workspace/utils.py", line 141, in create_and_prepare_model
[rank1]: model = AutoModelForCausalLM.from_pretrained(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
[rank1]: model_class = _get_model_class(config, cls._model_mapping)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 384, in _get_model_class
[rank1]: supported_models = model_mapping[type(config)]
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 735, in getitem
[rank1]: return self._load_attr_from_module(model_type, model_name)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 749, in _load_attr_from_module
[rank1]: return getattribute_from_module(self._modules[module_name], attr)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 693, in getattribute_from_module
[rank1]: if hasattr(module, attr):
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in getattr
[rank1]: module = self._get_module(self._class_to_module[name])
[rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
[rank1]: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
W1121 13:30:42.651000 1098 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1167 closing signal SIGTERM
E1121 13:30:42.815000 1098 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 1164) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1093, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
[1]:
time : 2024-11-21_13:30:42
host : 9fd6b25bf7af
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 1165)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-11-21_13:30:42
host : 9fd6b25bf7af
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 1166)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-11-21_13:30:42
host : 9fd6b25bf7af
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1164)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

pip list
Package                           Version
--------------------------------- -------------
absl-py                           2.1.0
accelerate                        0.33.0
aiohappyeyeballs                  2.4.3
aiohttp                           3.11.6
aiosignal                         1.3.1
annotated-types                   0.7.0
anyio                             4.0.0
argon2-cffi                       23.1.0
argon2-cffi-bindings              21.2.0
arrow                             1.3.0
asttokens                         2.4.1
async-lru                         2.0.4
async-timeout                     5.0.1
attrs                             23.1.0
Babel                             2.13.1
beautifulsoup4                    4.12.2
bitsandbytes                      0.44.1
bleach                            6.1.0
blinker                           1.4
certifi                           2022.12.7
cffi                              1.16.0
charset-normalizer                2.1.1
click                             8.1.7
comm                              0.2.0
contourpy                         1.3.1
cryptography                      3.4.8
cut-cross-entropy                 24.11.4
cycler                            0.12.1
datasets                          3.1.0
datatrove                         0.3.0
dbus-python                       1.2.18
debugpy                           1.8.0
decorator                         5.1.1
deepspeed                         0.15.4
defusedxml                        0.7.1
Deprecated                        1.2.15
dill                              0.3.8
distro                            1.7.0
docker-pycreds                    0.4.0
docstring_parser                  0.16
einops                            0.8.0
entrypoints                       0.4
evaluate                          0.4.3
exceptiongroup                    1.1.3
executing                         2.0.1
fastjsonschema                    2.18.1
filelock                          3.9.0
flash-attn                        2.7.0.post2
fonttools                         4.55.0
fqdn                              1.5.1
frozenlist                        1.5.0
fsspec                            2024.9.0
gitdb                             4.0.11
GitPython                         3.1.43
grpcio                            1.68.0
hf_transfer                       0.1.8
hjson                             3.1.0
httplib2                          0.20.2
huggingface-hub                   0.26.2
humanize                          4.11.0
idna                              3.4
importlib-metadata                4.6.4
ipykernel                         6.26.0
ipython                           8.17.2
ipython-genutils                  0.2.0
ipywidgets                        8.1.1
isoduration                       20.11.0
jedi                              0.19.1
jeepney                           0.7.1
Jinja2                            3.1.2
joblib                            1.4.2
json5                             0.9.14
jsonpointer                       2.4
jsonschema                        4.19.2
jsonschema-specifications         2023.7.1
jupyter-archive                   3.4.0
jupyter_client                    7.4.9
jupyter-contrib-core              0.4.2
jupyter-contrib-nbextensions      0.7.0
jupyter_core                      5.5.0
jupyter-events                    0.9.0
jupyter-highlight-selected-word   0.2.0
jupyter-lsp                       2.2.0
jupyter-nbextensions-configurator 0.6.3
jupyter_server                    2.10.0
jupyter_server_terminals          0.4.4
jupyterlab                        4.0.8
jupyterlab-pygments               0.2.2
jupyterlab_server                 2.25.0
jupyterlab-widgets                3.0.9
keyring                           23.5.0
kiwisolver                        1.4.7
launchpadlib                      1.10.16
lazr.restfulclient                0.14.4
lazr.uri                          1.0.6
loguru                            0.7.2
lxml                              4.9.3
Markdown                          3.7
markdown-it-py                    3.0.0
MarkupSafe                        2.1.2
matplotlib                        3.9.2
matplotlib-inline                 0.1.6
mdurl                             0.1.2
mistune                           3.0.2
more-itertools                    8.10.0
mpmath                            1.3.0
msgpack                           1.1.0
multidict                         6.1.0
multiprocess                      0.70.16
nbclassic                         1.0.0
nbclient                          0.9.0
nbconvert                         7.11.0
nbformat                          5.9.2
nest-asyncio                      1.5.8
networkx                          3.0
ninja                             1.11.1.1
nltk                              3.9.1
notebook                          6.5.5
notebook_shim                     0.2.3
numpy                             1.26.4
nvidia-cublas-cu12                12.4.5.8
nvidia-cuda-cupti-cu12            12.4.127
nvidia-cuda-nvrtc-cu12            12.4.127
nvidia-cuda-runtime-cu12          12.4.127
nvidia-cudnn-cu12                 9.1.0.70
nvidia-cufft-cu12                 11.2.1.3
nvidia-curand-cu12                10.3.5.147
nvidia-cusolver-cu12              11.6.1.9
nvidia-cusparse-cu12              12.3.1.170
nvidia-ml-py                      12.560.30
nvidia-nccl-cu12                  2.21.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.4.127
oauthlib                          3.2.0
overrides                         7.4.0
packaging                         23.2
pandas                            2.2.3
pandocfilters                     1.5.0
parso                             0.8.3
peft                              0.13.3.dev0
pexpect                           4.8.0
Pillow                            9.3.0
pip                               23.3.1
platformdirs                      3.11.0
prometheus-client                 0.18.0
prompt-toolkit                    3.0.39
propcache                         0.2.0
protobuf                          3.20.3
psutil                            5.9.6
ptyprocess                        0.7.0
pure-eval                         0.2.2
py-cpuinfo                        9.0.0
pyarrow                           18.0.0
pycparser                         2.21
pydantic                          2.10.0
pydantic_core                     2.27.0
PyGithub                          2.5.0
Pygments                          2.16.1
PyGObject                         3.42.1
PyJWT                             2.10.0
PyNaCl                            1.5.0
pyparsing                         2.4.7
python-apt                        2.4.0+ubuntu2
python-dateutil                   2.8.2
python-json-logger                2.0.7
pytz                              2024.2
PyYAML                            6.0.1
pyzmq                             24.0.1
referencing                       0.30.2
regex                             2024.11.6
requests                          2.32.3
rfc3339-validator                 0.1.4
rfc3986-validator                 0.1.1
rich                              13.9.4
rpds-py                           0.12.0
safetensors                       0.4.5
scikit-learn                      1.5.2
scipy                             1.14.1
SecretStorage                     3.3.1
Send2Trash                        1.8.2
sentencepiece                     0.2.0
sentry-sdk                        2.18.0
setproctitle                      1.3.4
setuptools                        68.2.2
shtab                             1.7.1
six                               1.16.0
smmap                             5.0.1
sniffio                           1.3.0
soupsieve                         2.5
stack-data                        0.6.3
sympy                             1.13.1
tensorboard                       2.18.0
tensorboard-data-server           0.7.2
terminado                         0.17.1
threadpoolctl                     3.5.0
tiktoken                          0.8.0
tinycss2                          1.2.1
tokenizers                        0.19.1
tomli                             2.0.1
torch                             2.5.1
torchaudio                        2.1.0+cu118
torchvision                       0.16.0+cu118
tornado                           6.3.3
tqdm                              4.67.0
traitlets                         5.13.0
transformers                      4.44.2
triton                            3.1.0
trl                               0.11.0
types-python-dateutil             2.8.19.14
typing_extensions                 4.12.2
tyro                              0.9.1
tzdata                            2024.2
unsloth                           2024.11.8
unsloth_zoo                       2024.11.6
uri-template                      1.3.0
urllib3                           1.26.13
wadllib                           1.3.6
wandb                             0.18.7
wcwidth                           0.2.9
webcolors                         1.13
webencodings                      0.5.1
websocket-client                  1.6.4
Werkzeug                          3.1.3
wheel                             0.45.0
widgetsnbextension                4.0.9
wrapt                             1.16.0
xformers                          0.0.28.post3
xxhash                            3.5.0
yarl                              1.17.2
zipp                              1.0.0

@BenjaminBossan
Copy link
Member

Thanks for trying it out @Amerehei. The error seems to be caused by flash attention 2 and is probably unrelated to the initial issue. Could you try rebuilding the package or not using flash attention?

@Amerehei
Copy link
Author

Amerehei commented Nov 21, 2024

@BenjaminBossan I've set --use_flash_attn False and in another run entierly removed it from the command.
In both cases I got the same error

I also commented attn_implementation param passed to AutoModelForCausalLM.from_pretrained
Same problem with attn_implementation="sdpa"

@Amerehei
Copy link
Author

@BenjaminBossan I found my mistake, pytorch 2.1 image was selected by default in today's run, forget the fast attention problem, Here is the new experiment result:

Log (--use_flash_attn True)
Running command: accelerate launch --config_file configs/fsdp_config.yaml  train.py       --seed 100       --model_name_or_path meta-llama/Llama-2-7b-hf       --dataset_name smangrul/ultrachat-10k-chatml       --chat_template_format chatml       --add_special_tokens False       --append_concat_token False       --splits train,test       --max_seq_len 2048       --num_train_epochs 1       --logging_steps 5       --log_level info       --logging_strategy steps       --eval_strategy epoch       --save_strategy epoch       --push_to_hub       --hub_private_repo True       --hub_strategy every_save       --bf16 True       --packing True       --learning_rate 1e-4       --lr_scheduler_type cosine       --weight_decay 1e-4       --warmup_ratio 0.0       --max_grad_norm 1.0       --output_dir mistral-sft-lora-fsdp       --per_device_train_batch_size 8       --per_device_eval_batch_size 8       --gradient_accumulation_steps 4       --gradient_checkpointing True       --use_reentrant False       --dataset_text_field content       --use_flash_attn True       --use_peft_lora True       --lora_r 8       --lora_alpha 16       --lora_dropout 0.1       --lora_target_modules all-linear       --use_4bit_quantization False
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.16it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.49it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.98it/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
[rank3]:[W1121 14:40:51.201069492 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank1]:[W1121 14:40:51.326864873 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank2]:[W1121 14:40:51.409507721 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.39s/it]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
[rank0]:[W1121 14:41:01.484323943 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Using auto half precision backend
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32008, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaFlashAttention2(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (v_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (o_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (rotary_emb): LlamaRotaryEmbedding()
            )
            (mlp): LlamaMLP(
              (gate_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (up_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (down_proj): lora.Linear(
                (base_layer): Linear(in_features=11008, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=11008, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (act_fn): SiLU()
            )
            (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
            (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
          )
        )
        (norm): LlamaRMSNorm((4096,), eps=1e-05)
        (rotary_emb): LlamaRotaryEmbedding()
      )
      (lm_head): Linear(in_features=4096, out_features=32008, bias=False)
    )
  )
)
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
[2024-11-21 14:41:03,977] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-21 14:41:04,021] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-21 14:41:04,027] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-21 14:41:04,038] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
***** Running training *****
  Num examples = 8
  Num Epochs = 1
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 4
  Total optimization steps = 1
  Number of trainable parameters = 4,997,120
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.18.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
  0%|                                                                                                                                                                                | 0/1 [00:00
    main(model_args, data_args, training_args)
  File "/workspace/train.py", line 139, in main
    trainer.train(resume_from_checkpoint=checkpoint)
  File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
    output = super().train(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
    self.optimizer.step()
  File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
    self.optimizer.step(closure)
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
    return func.__get__(opt, opt.__class__)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
    ret = func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
    adamw(
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
    func(
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
    grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
    return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
    return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
[rank2]: Traceback (most recent call last):
[rank2]:   File "/workspace/train.py", line 155, in 
[rank2]:     main(model_args, data_args, training_args)
[rank2]:   File "/workspace/train.py", line 139, in main
[rank2]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank2]:     output = super().train(*args, **kwargs)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank2]:     return inner_training_loop(
[rank2]:            ^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank2]:     self.optimizer.step()
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank2]:     self.optimizer.step(closure)
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank2]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank2]:     out = func(*args, **kwargs)
[rank2]:           ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank2]:     ret = func(self, *args, **kwargs)
[rank2]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank2]:     adamw(
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank2]:     func(
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank2]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank2]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank2]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank2]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
[rank1]: Traceback (most recent call last):
[rank1]:   File "/workspace/train.py", line 155, in 
[rank1]:     main(model_args, data_args, training_args)
[rank1]:   File "/workspace/train.py", line 139, in main
[rank1]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank1]:     output = super().train(*args, **kwargs)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank1]:     return inner_training_loop(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank1]:     self.optimizer.step()
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank1]:     self.optimizer.step(closure)
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank1]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank1]:     out = func(*args, **kwargs)
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank1]:     ret = func(self, *args, **kwargs)
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank1]:     adamw(
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank1]:     func(
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank1]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank1]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank1]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank1]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
[rank3]: Traceback (most recent call last):
[rank3]:   File "/workspace/train.py", line 155, in 
[rank3]:     main(model_args, data_args, training_args)
[rank3]:   File "/workspace/train.py", line 139, in main
[rank3]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank3]:     output = super().train(*args, **kwargs)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank3]:     return inner_training_loop(
[rank3]:            ^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank3]:     self.optimizer.step()
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank3]:     self.optimizer.step(closure)
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank3]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank3]:     out = func(*args, **kwargs)
[rank3]:           ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank3]:     ret = func(self, *args, **kwargs)
[rank3]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank3]:     adamw(
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank3]:     func(
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank3]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank3]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank3]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank3]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
[rank0]: Traceback (most recent call last):
[rank0]:   File "/workspace/train.py", line 155, in 
[rank0]:     main(model_args, data_args, training_args)
[rank0]:   File "/workspace/train.py", line 139, in main
[rank0]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank0]:     output = super().train(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank0]:     self.optimizer.step()
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank0]:     self.optimizer.step(closure)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank0]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank0]:     out = func(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank0]:     ret = func(self, *args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank0]:     adamw(
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank0]:     func(
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank0]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank0]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank0]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /workspace/wandb/offline-run-20241121_144135-q8ddduki
wandb: Find logs at: wandb/offline-run-20241121_144135-q8ddduki/logs
W1121 14:41:50.206000 1838 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1904 closing signal SIGTERM
W1121 14:41:50.210000 1838 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1906 closing signal SIGTERM
W1121 14:41:50.210000 1838 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1907 closing signal SIGTERM
E1121 14:41:50.590000 1838 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 1905) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in 
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1093, in launch_command
    multi_gpu_launcher(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher
    distrib_run.run(args)
  File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-11-21_14:41:50
  host      : c00dfa0c2ca1
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 1905)
  error_file: 
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Log (--use_flash_attn False)
Running command: accelerate launch --config_file configs/fsdp_config.yaml  train.py       --seed 100       --model_name_or_path meta-llama/Llama-2-7b-hf       --dataset_name smangrul/ultrachat-10k-chatml       --chat_template_format chatml       --add_special_tokens False       --append_concat_token False       --splits train,test       --max_seq_len 2048       --num_train_epochs 1       --logging_steps 5       --log_level info       --logging_strategy steps       --eval_strategy epoch       --save_strategy epoch       --push_to_hub       --hub_private_repo True       --hub_strategy every_save       --bf16 True       --packing True       --learning_rate 1e-4       --lr_scheduler_type cosine       --weight_decay 1e-4       --warmup_ratio 0.0       --max_grad_norm 1.0       --output_dir mistral-sft-lora-fsdp       --per_device_train_batch_size 8       --per_device_eval_batch_size 8       --gradient_accumulation_steps 4       --gradient_checkpointing True       --use_reentrant False       --dataset_text_field content       --use_flash_attn False       --use_peft_lora True       --lora_r 8       --lora_alpha 16       --lora_dropout 0.1       --lora_target_modules all-linear       --use_4bit_quantization False
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 609/609 [00:00<00:00, 1.79MB/s]
model.safetensors.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 48.8MB/s]
model-00001-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [00:47<00:00, 212MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:17<00:00, 195MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:05<00:00, 32.64s/it]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:05<00:00, 32.62s/it]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:05<00:00, 32.62s/it]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:05<00:00, 32.64s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  6.88it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.20it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  6.14it/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 188/188 [00:00<00:00, 572kB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 776/776 [00:00<00:00, 3.61MB/s]
tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 112MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 22.3MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.26MB/s]
README.md: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524/524 [00:00<00:00, 1.72MB/s]
train-00000-of-00001.parquet: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35.2M/35.2M [00:00<00:00, 42.4MB/s]
test-00000-of-00001.parquet: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.08M/7.08M [00:00<00:00, 40.0MB/s]
Generating train split: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 35997.11 examples/s]
Generating test split: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 35983.77 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 230.91 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 295.64 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2201.27 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2222.97 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 311.74 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Loading checkpoint shards:  50%|██████████████████████████████████████████████████████████████████████▌                                                                      | 1/2 [00:08<00:08,  8.60s/it][rank3]:[W1121 14:38:20.995918785 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank2]:[W1121 14:38:20.000367175 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank1]:[W1121 14:38:20.005100558 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.53s/it]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Generating train split: 8 examples [00:00, 278.11 examples/s]
Generating train split: 8 examples [00:00, 309.70 examples/s]
[rank0]:[W1121 14:38:29.329199115 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
Using auto half precision backend
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32008, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (v_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (o_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (rotary_emb): LlamaRotaryEmbedding()
            )
            (mlp): LlamaMLP(
              (gate_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (up_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (down_proj): lora.Linear(
                (base_layer): Linear(in_features=11008, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=11008, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (act_fn): SiLU()
            )
            (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
            (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
          )
        )
        (norm): LlamaRMSNorm((4096,), eps=1e-05)
        (rotary_emb): LlamaRotaryEmbedding()
      )
      (lm_head): Linear(in_features=4096, out_features=32008, bias=False)
    )
  )
)
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
[2024-11-21 14:38:31,860] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-21 14:38:31,871] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-21 14:38:31,894] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-21 14:38:31,920] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /root/.triton/autotune: No such file or directory
***** Running training *****
  Num examples = 8
  Num Epochs = 1
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 4
  Total optimization steps = 1
  Number of trainable parameters = 4,997,120
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.18.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
  0%|                                                                                                                                                                                | 0/1 [00:00
[rank1]:     main(model_args, data_args, training_args)
[rank1]:   File "/workspace/train.py", line 139, in main
[rank1]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank1]:     output = super().train(*args, **kwargs)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank1]:     return inner_training_loop(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank1]:     self.optimizer.step()
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank1]:     self.optimizer.step(closure)
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank1]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank1]:     out = func(*args, **kwargs)
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank1]:     ret = func(self, *args, **kwargs)
[rank1]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank1]:     adamw(
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank1]:     func(
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank1]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank1]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank1]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank1]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
Traceback (most recent call last):
[rank3]: Traceback (most recent call last):
[rank3]:   File "/workspace/train.py", line 155, in 
[rank3]:     main(model_args, data_args, training_args)
[rank3]:   File "/workspace/train.py", line 139, in main
[rank3]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank3]:     output = super().train(*args, **kwargs)
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank3]:     return inner_training_loop(
[rank3]:            ^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank3]:     self.optimizer.step()
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank3]:     self.optimizer.step(closure)
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank3]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank3]:     out = func(*args, **kwargs)
[rank3]:           ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank3]:     ret = func(self, *args, **kwargs)
[rank3]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank3]:     adamw(
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank3]:     func(
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank3]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank3]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank3]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank3]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
  File "/workspace/train.py", line 155, in 
    main(model_args, data_args, training_args)
  File "/workspace/train.py", line 139, in main
    trainer.train(resume_from_checkpoint=checkpoint)
  File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
    output = super().train(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
    self.optimizer.step()
  File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
    self.optimizer.step(closure)
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
    return func.__get__(opt, opt.__class__)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
    ret = func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
    adamw(
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
    func(
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
    grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
    return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
    return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
[rank2]: Traceback (most recent call last):
[rank2]:   File "/workspace/train.py", line 155, in 
[rank2]:     main(model_args, data_args, training_args)
[rank2]:   File "/workspace/train.py", line 139, in main
[rank2]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank2]:     output = super().train(*args, **kwargs)
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank2]:     return inner_training_loop(
[rank2]:            ^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank2]:     self.optimizer.step()
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank2]:     self.optimizer.step(closure)
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank2]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank2]:     out = func(*args, **kwargs)
[rank2]:           ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank2]:     ret = func(self, *args, **kwargs)
[rank2]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank2]:     adamw(
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank2]:     func(
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank2]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank2]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank2]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank2]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
[rank0]: Traceback (most recent call last):
[rank0]:   File "/workspace/train.py", line 155, in 
[rank0]:     main(model_args, data_args, training_args)
[rank0]:   File "/workspace/train.py", line 139, in main
[rank0]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py", line 434, in train
[rank0]:     output = super().train(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 1929, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2341, in _inner_training_loop
[rank0]:     self.optimizer.step()
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/optimizer.py", line 170, in step
[rank0]:     self.optimizer.step(closure)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
[rank0]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 487, in wrapper
[rank0]:     out = func(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 91, in _use_grad
[rank0]:     ret = func(self, *args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 220, in step
[rank0]:     adamw(
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 782, in adamw
[rank0]:     func(
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
[rank0]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
[rank0]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
[rank0]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /workspace/wandb/offline-run-20241121_143903-enf1o3qd
wandb: Find logs at: wandb/offline-run-20241121_143903-enf1o3qd/logs
W1121 14:39:25.634000 1364 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1430 closing signal SIGTERM
W1121 14:39:25.635000 1364 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1431 closing signal SIGTERM
W1121 14:39:25.635000 1364 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1433 closing signal SIGTERM
E1121 14:39:25.977000 1364 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 2 (pid: 1432) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in 
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1093, in launch_command
    multi_gpu_launcher(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher
    distrib_run.run(args)
  File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-11-21_14:39:25
  host      : c00dfa0c2ca1
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 1432)
  error_file: 
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
pip list
Package                           Version
--------------------------------- --------------
absl-py                           2.1.0
accelerate                        0.33.0
aiohappyeyeballs                  2.4.3
aiohttp                           3.11.6
aiosignal                         1.3.1
annotated-types                   0.7.0
anyio                             4.6.0
argon2-cffi                       23.1.0
argon2-cffi-bindings              21.2.0
arrow                             1.3.0
asttokens                         2.4.1
async-lru                         2.0.4
attrs                             24.2.0
babel                             2.16.0
beautifulsoup4                    4.12.3
bitsandbytes                      0.44.1
bleach                            6.1.0
blinker                           1.4
certifi                           2024.8.30
cffi                              1.17.1
charset-normalizer                3.3.2
click                             8.1.7
comm                              0.2.2
contourpy                         1.3.1
cryptography                      3.4.8
cut-cross-entropy                 24.11.4
cycler                            0.12.1
datasets                          3.1.0
datatrove                         0.3.0
dbus-python                       1.2.18
debugpy                           1.8.5
decorator                         5.1.1
deepspeed                         0.15.4
defusedxml                        0.7.1
Deprecated                        1.2.15
dill                              0.3.8
distro                            1.7.0
docker-pycreds                    0.4.0
docstring_parser                  0.16
einops                            0.8.0
entrypoints                       0.4
evaluate                          0.4.3
executing                         2.1.0
fastjsonschema                    2.20.0
filelock                          3.13.1
flash-attn                        2.7.0.post2
fonttools                         4.55.0
fqdn                              1.5.1
frozenlist                        1.5.0
fsspec                            2024.2.0
gitdb                             4.0.11
GitPython                         3.1.43
grpcio                            1.68.0
h11                               0.14.0
hf_transfer                       0.1.8
hjson                             3.1.0
httpcore                          1.0.5
httplib2                          0.20.2
httpx                             0.27.2
huggingface-hub                   0.26.2
humanize                          4.11.0
idna                              3.10
importlib-metadata                4.6.4
ipykernel                         6.29.5
ipython                           8.27.0
ipython-genutils                  0.2.0
ipywidgets                        8.1.5
isoduration                       20.11.0
jedi                              0.19.1
jeepney                           0.7.1
Jinja2                            3.1.3
joblib                            1.4.2
json5                             0.9.25
jsonpointer                       3.0.0
jsonschema                        4.23.0
jsonschema-specifications         2023.12.1
jupyter-archive                   3.4.0
jupyter_client                    7.4.9
jupyter_contrib_core              0.4.2
jupyter_contrib_nbextensions      0.7.0
jupyter_core                      5.7.2
jupyter-events                    0.10.0
jupyter-highlight-selected-word   0.2.0
jupyter-lsp                       2.2.5
jupyter_nbextensions_configurator 0.6.4
jupyter_server                    2.14.2
jupyter_server_terminals          0.5.3
jupyterlab                        4.2.5
jupyterlab_pygments               0.3.0
jupyterlab_server                 2.27.3
jupyterlab_widgets                3.0.13
keyring                           23.5.0
kiwisolver                        1.4.7
launchpadlib                      1.10.16
lazr.restfulclient                0.14.4
lazr.uri                          1.0.6
loguru                            0.7.2
lxml                              5.3.0
Markdown                          3.7
markdown-it-py                    3.0.0
MarkupSafe                        2.1.5
matplotlib                        3.9.2
matplotlib-inline                 0.1.7
mdurl                             0.1.2
mistune                           3.0.2
more-itertools                    8.10.0
mpmath                            1.3.0
msgpack                           1.1.0
multidict                         6.1.0
multiprocess                      0.70.16
nbclassic                         1.1.0
nbclient                          0.10.0
nbconvert                         7.16.4
nbformat                          5.10.4
nest-asyncio                      1.6.0
networkx                          3.2.1
ninja                             1.11.1.1
nltk                              3.9.1
notebook                          6.5.5
notebook_shim                     0.2.4
numpy                             1.26.3
nvidia-cublas-cu12                12.4.5.8
nvidia-cuda-cupti-cu12            12.4.127
nvidia-cuda-nvrtc-cu12            12.4.127
nvidia-cuda-runtime-cu12          12.4.127
nvidia-cudnn-cu12                 9.1.0.70
nvidia-cufft-cu12                 11.2.1.3
nvidia-curand-cu12                10.3.5.147
nvidia-cusolver-cu12              11.6.1.9
nvidia-cusparse-cu12              12.3.1.170
nvidia-ml-py                      12.560.30
nvidia-nccl-cu12                  2.21.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.4.127
oauthlib                          3.2.0
overrides                         7.7.0
packaging                         24.1
pandas                            2.2.3
pandocfilters                     1.5.1
parso                             0.8.4
peft                              0.13.3.dev0
pexpect                           4.9.0
pillow                            10.2.0
pip                               24.2
platformdirs                      4.3.6
prometheus_client                 0.21.0
prompt_toolkit                    3.0.47
propcache                         0.2.0
protobuf                          3.20.3
psutil                            6.0.0
ptyprocess                        0.7.0
pure_eval                         0.2.3
py-cpuinfo                        9.0.0
pyarrow                           18.0.0
pycparser                         2.22
pydantic                          2.10.0
pydantic_core                     2.27.0
PyGithub                          2.5.0
Pygments                          2.18.0
PyGObject                         3.42.1
PyJWT                             2.10.0
PyNaCl                            1.5.0
pyparsing                         2.4.7
python-apt                        2.4.0+ubuntu4
python-dateutil                   2.9.0.post0
python-json-logger                2.0.7
pytz                              2024.2
PyYAML                            6.0.2
pyzmq                             24.0.1
referencing                       0.35.1
regex                             2024.11.6
requests                          2.32.3
rfc3339-validator                 0.1.4
rfc3986-validator                 0.1.1
rich                              13.9.4
rpds-py                           0.20.0
safetensors                       0.4.5
scikit-learn                      1.5.2
scipy                             1.14.1
SecretStorage                     3.3.1
Send2Trash                        1.8.3
sentencepiece                     0.2.0
sentry-sdk                        2.18.0
setproctitle                      1.3.4
setuptools                        75.1.0
shtab                             1.7.1
six                               1.16.0
smmap                             5.0.1
sniffio                           1.3.1
soupsieve                         2.6
stack-data                        0.6.3
sympy                             1.13.1
tensorboard                       2.18.0
tensorboard-data-server           0.7.2
terminado                         0.18.1
threadpoolctl                     3.5.0
tiktoken                          0.8.0
tinycss2                          1.3.0
tokenizers                        0.19.1
torch                             2.5.1
torchaudio                        2.4.1+cu124
torchvision                       0.19.1+cu124
tornado                           6.4.1
tqdm                              4.67.0
traitlets                         5.14.3
transformers                      4.44.2
triton                            3.1.0
trl                               0.11.0
types-python-dateutil             2.9.0.20240906
typing_extensions                 4.12.2
tyro                              0.9.1
tzdata                            2024.2
unsloth                           2024.11.8
unsloth_zoo                       2024.11.6
uri-template                      1.3.0
urllib3                           2.2.3
wadllib                           1.3.6
wandb                             0.18.7
wcwidth                           0.2.13
webcolors                         24.8.0
webencodings                      0.5.1
websocket-client                  1.8.0
Werkzeug                          3.1.3
wheel                             0.44.0
widgetsnbextension                4.0.13
wrapt                             1.16.0
xformers                          0.0.28.post3
xxhash                            3.5.0
yarl                              1.17.2
zipp                              1.0.0

@Amerehei
Copy link
Author

@BenjaminBossan I use Runpod services to run it, while I use runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 after installing pip requirements it upgrades pytorch to 2.5.1

@vrancurel
Copy link

It worked for me ! thank you @BenjaminBossan !

@Amerehei
Copy link
Author

@vrancurel Can you share your environment details including Python, Cuda, and pip list

@vrancurel
Copy link

Python 3.10 CUDA 12.4 and exactly the same versions as you suggested.

@winglian
Copy link
Contributor

winglian commented Dec 8, 2024

@BenjaminBossan This seems to be a transformers issue for me (I'll track down the exact commit later today). But on latest accelerate (1.2.0), peft(0.14.0), bnb(0.45.0), and transformers (0.47.0) I'm able to reproduce this in axolotl's test suite for fsdp, but by simply downgrading transformers to 0.46.3 solves the issue for me.

@winglian
Copy link
Contributor

winglian commented Dec 8, 2024

@BenjaminBossan @muellerzr looks like it's this change from October that's breaking it on the latest releases huggingface/transformers@8b3b9b4

@BenjaminBossan
Copy link
Member

Thanks for investigating this @winglian. From the error message, it is plausible that the change you cited would have the effect we see.

It's probably not as easy as just reverting the commit, as the change appears to be done on purpose, but to me it's not clear how this relates to FP8 mixed precision. Hopefully Zach can give some insights.

@BenjaminBossan
Copy link
Member

For everyone in this discussion: Installing transformers from source (or waiting for the next release) should fix this problem. If you still encounter the same error, try turning off flash attention and see if that helps. Report back if you still have trouble.

@invis166
Copy link

invis166 commented Dec 18, 2024

tried training with sdpa instead of flash_attention_2, but im still getting the same error. the issue does not occur with transformers 4.46.3, but it does with 4.47

torch 2.5.1, cuda 11.8, cudnn 9,
transformers==4.47.0
peft==0.13.2
bitsandbytes==0.44.1
flash-attn==2.7.0.post2
accelerate==1.1.1 \

@invis166
Copy link

btw i just tried transformers 4.47.1 that released yesterday and it works

@BenjaminBossan
Copy link
Member

btw i just tried transformers 4.47.1 that released yesterday and it works

Great to hear that. Yes, the fix is in 4.47.1 but not in 4.47.0, which is why your previous test failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants