can't compile llama-3-8B or llama-3.1-8B with lora if batch size is more than 1 #709
Open
3 of 4 tasks
Labels
bug
Something isn't working
System Info
Who can help?
@michaelbenayoun
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
I am trying to fine-tune Llama-3-8B on a single trn1.2xlarge instance. I am following the tutorial here: https://huggingface.co/docs/optimum-neuron/en/training_tutorials/sft_lora_finetune_llm but changing PROCESSES_PER_NODE and TP_DEGREE variables. My compilation script looks like this:
however, during compilation of some graphs I get this error:
I can compile and complete training without error if I set the batch_size to 1, however I would like to be able to increase the batch size to speed up training.
I also get these warnings which may be relevant:
Expected behavior
I expect the model to compile and training script to function properly without error.
The text was updated successfully, but these errors were encountered: