-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional arg to specify device for Transformer model. #165
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you not need any other changes in generation.py
?
I probably would, but I haven't/can't use any of the generative models. My machine already struggles with the 1B and 3B models and usually kills the 8B models for using too much memory.
Update #2: There are other ways to hard-code "cuda" usage (oops silly me). I believe I found them all and updated them appropriately. |
I went ahead and tried to find all the hard-coded "cuda" device calls and replace them appropriately. |
models/llama3/reference_impl/multimodal/model.py |
torch.set_default_tensor_type(torch.cuda.HalfTensor) | ||
else: | ||
torch.set_default_tensor_type(torch.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.set_default_tensor_type
seems to be deprecated starting from pytorch 2.1, see comment at API description https://pytorch.org/docs/2.5/generated/torch.set_default_tensor_type.html#torch-set-default-tensor-type
Maybe it's better to use torch.set_default_dtype(torch.float16)
?
model.setup_cache(model_args.max_batch_size, torch.bfloat16) | ||
else: | ||
model = Transformer(model_args) | ||
model = Transformer(model_args, device=device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my side that's not enough to apply this PR and pass device
to Transforner
on initialization. I still need to call model.to(device)
in a consequent step. I.e.:
model = Transformer(model_args, device=device)
model.to(device)
This does not quite makes sense to me though. If we pass device
on class creation then we should not require to still call .to()
to cast everything to the same device. I guess it implies that either .to()
should be done in the end of __init__()
of Transformer
or passing device
should completely be not required and initialization of classes fixed in a way that they are device agnostic (created on CPU), then .to()
should just work since it seem to run recursively for all submodules. I think the reason it did not work is couple places where .cuda()
is called on tensor creation.
Hi,
First off, I wanted to say thanks for publishing this work so openly!
For curiosity's sake, I've been trying to run the models locally on my Mac M1, so my device options are 'cpu' and 'mps'. Either way, I need a way to specify the device rather than always using cuda.