-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi gpu use? #59
Comments
Do you use it for training or inference? |
I used it for training. It looks like the script does use multiple gpu but it runs out of memory due to high batch size. I will close this ticket |
I used the example provided and also put accelerator but both of them fails to use more than 1 GPU. Any suggestions? |
hi @ganeshkrishnan1 , could you provide the training script? |
Here is one example that can be successfully run on multi gpus: CUDA_VISIBLE_DEVICES=0,1 WANDB_MODE=disabled torchrun --nproc_per_node=2 --master_port=2345 train_cli.py \
--model_name_or_path mixedbread-ai/mxbai-embed-large-v1 \
--train_name_or_path ./snli_5k.jsonl --save_dir mxbai-snli-ckpts \
--w1 0. --w2 20.0 --w3 1.0 --angle_tau 20.0 --learning_rate 3e-6 --maxlen 64 \
--pooling_strategy cls \
--epochs 1 \
--batch_size 32 \
--logging_steps 100 \
--warmup_steps 200 \
--save_steps 1000 --seed 42 --gradient_accumulation_steps 2 --fp16 1 --torch_dtype 'float32' train_cli.py is from: https://github.com/SeanLee97/AnglE/blob/main/angle_emb/train_cli.py data format: $ head -3 snli_5k.jsonl
{"text": "A person on a horse jumps over a broken down airplane.", "positive": "A person is outdoors, on a horse.", "negative": "A person is at a diner, ordering an omelette."}
{"text": "Children smiling and waving at camera", "positive": "There are children present", "negative": "The kids are frowning"}
{"text": "A boy is jumping on skateboard in the middle of a red bridge.", "positive": "The boy does a skateboarding trick.", "negative": "The boy skates down the sidewalk."} |
This is my python code:
|
I haven't tried multiGPU in python code, just used it supported by Transformers Trainer. BTW, here are some tips to improve the model:
|
Thanks for the tip about the w. I am using DataFormat C. eg Should I use the same as B? |
Negative is very hard to generate from unlabelled text for DataSet B. We have "product title" -> "search term" as positive correlation but there is no correct way to generate negative Like you mentioned, the performance of Dataset C on training from sample was not as good as I wanted it to be. I am running the trainer on our whole dataset of 200m records and report back on performance (~15 days) |
For such large datasets, it is better to specify a small learning_rate such as 1e-6, and specify |
I don't mind catastrophic forgetting. I could even train from scratch with the amount of data we have. The learning rate is currently set to 3e-6. It took 8 hours for the dataset to load so I think I will let this training run and then re-run with the smaller one you mentioned. Your models don't seem compatible with KeyBert https://github.com/MaartenGr/keyBERT so that's one more challenge for me |
I found KeyBert works for sentence-transformers. Maybe you can add a feature to make it support |
I will ask someone from our team to look into it. Right now its easier for me to use this for generating vectors and training a different sentence transformers for generating keywords from documents: two different usecases |
btw, can my team member reach out on your email to get some support for adding support of angle_emb to sentence-transformers? |
Sure! thanks! BTW, I am working on exporting sentence-transformers (ST) model so that the AnglE-trained model can be used in ST. |
|
hi @JianyangHu, here is a multi GPU inference example: https://github.com/SeanLee97/AnglE/blob/main/examples/multigpu_infer.py |
I am running out of memory on Tesla T4. I have 4 of them though and I usually use accelerator for multigpu setup. How can I use them for angle semantic similarity?
The text was updated successfully, but these errors were encountered: