Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.compile not working with device_map and multiple GPUs #2592

Open
2 of 4 tasks
sheepymeh opened this issue Mar 27, 2024 · 2 comments · May be fixed by #2609
Open
2 of 4 tasks

torch.compile not working with device_map and multiple GPUs #2592

sheepymeh opened this issue Mar 27, 2024 · 2 comments · May be fixed by #2609
Labels
big model inference Relates to the big model inference capabilities wip Work in progress

Comments

@sheepymeh
Copy link

sheepymeh commented Mar 27, 2024

System Info

- `Accelerate` version: 0.28.0
- Platform: Linux-5.15.0-101-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 440.57 GB
- GPU type: NVIDIA GeForce RTX 3090
- `Accelerate` default config:
	Not found

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

I am unable to use torch.compile with device_map when using multiple GPUs. I believe this issue is related to #2387.

This can be reproduced with the following (the model does not seem to matter):

from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224", device_map="auto")
model.compile(fullgraph=True)

inputs = processor(images=image, return_tensors="pt").to(model.device)
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
Stack trace
Traceback (most recent call last):
  File "/home/sheepymeh/transformers/test_vit.py", line 13, in <module>
    outputs = model(**inputs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1509, in _wrapped_call_impl
    return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 655, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 383, in _convert_frame_assert
    compiled_product = _compile(
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 646, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 244, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 562, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1033, in transform_code_object
    transformations(instructions, code_options)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 151, in _fn
    return fn(*args, **kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 527, in transform
    tracer.run()
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2128, in run
    super().run()
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 818, in run
    and self.step()
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 781, in step
    getattr(self, inst.opname)(inst)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 470, in wrapper
    return inner_fn(self, inst)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1252, in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars.items)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 652, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 294, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 248, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 81, in call_function
    return tx.inline_user_function_return(
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 688, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2261, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2376, in inline_call_
    tracer.run()
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 818, in run
    and self.step()
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 781, in step
    getattr(self, inst.opname)(inst)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 470, in wrapper
    return inner_fn(self, inst)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1213, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 652, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 248, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 81, in call_function
    return tx.inline_user_function_return(
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 688, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2261, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2376, in inline_call_
    tracer.run()
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 818, in run
    and self.step()
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 781, in step
    getattr(self, inst.opname)(inst)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 470, in wrapper
    return inner_fn(self, inst)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1213, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 652, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/builtin.py", line 651, in call_function
    result = handler(tx, *args, **kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/builtin.py", line 1069, in call_hasattr
    return obj.call_hasattr(tx, name)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/variables/base.py", line 306, in call_hasattr
    unimplemented(f"hasattr {self.__class__.__name__} {name}")
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/torch/_dynamo/exc.py", line 193, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: hasattr TupleVariable to

from user code:
   File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 161, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 356, in pre_forward
    return send_to_device(args, self.execution_device), send_to_device(
  File "/home/sheepymeh/transformers/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 148, in send_to_device
    if is_torch_tensor(tensor) or hasattr(tensor, "to"):

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

I have verified that this issue does not occur if I use .cuda() or enforce that the model run on a single GPU.

Expected behavior

I expected the compilation to work with a single or with multiple GPUs.

@SunMarc
Copy link
Member

SunMarc commented Apr 2, 2024

Hi @sheepymeh, thanks for reporting ! I'll have a look asap. If you manage to fix the fix, feel free to open a PR !

EDIT: I looked into it and this is quite complex issue. In pytorch 2.2.1, hasattr is not implemented for TupleVariable, so we get an error in is_namedtuple function. I tried a few methods to not use hasattr but I didn't manage to make it work (e.g. try except block). If we bypass the is_namedtuple function, compile works. In pytorch nighly (future 2.4), this is implemented but we hit another error ( breaking changes ... ). So, fixing this issue is quite complex. I'll have a look later if people are interested in this feature !

@SunMarc SunMarc linked a pull request Apr 2, 2024 that will close this issue
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@SunMarc SunMarc added big model inference Relates to the big model inference capabilities wip Work in progress labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
big model inference Relates to the big model inference capabilities wip Work in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants