-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
traning speed is very slow #30
Comments
It'd be hard to diagnose this based on qualitative descriptions. Maybe you can share some of you setups, observed&reference throughput/latency, etc. |
thanks, I am using huggingface Trainer to train a Qwen7B model, here is my setups and corespodding code: ① compute loss function, which override the original Trainer compute_loss function:
As you can see, I set chunksize=1,and I also tried to set chunksize=4\16\64, and batchsize in trainer setting is 256; ②DistributedContrastiveLoss function, similar with DistributedContrastiveLoss in loss.py of grad_cache pakage, only added a temperature parameter to scale the score;
which SimpleContrastiveLoss like this:
finally, code can run success, but grad updata is extremely slow |
can you share the observed runtime and reference runtime meanwhile, one thing to note is that huggingface Trainer can trigger features like deepspeed zero, which came after gradcache release and therefore may not be smoothly supported. |
@luyug I think I have figure this prolem out, thanks.
As you can see, the loss is alway around 2, and if I don't use grad cache , loss can converge to 0.2 |
Hi, I am having similar issue. The loss does not converge after using gradcache. Did you solve this issue? |
Hi, I use grad_cache to train my model, but it seems very slow, I want to konw is this normal?
Does using grad cache generally affect the training speed?
The text was updated successfully, but these errors were encountered: