Enable cpu offload with weights inside the module #2214

SunMarc · 2023-12-04T23:11:15Z

What does this PR do ?

This PR adds the possibility to perform cpu offload with the weights stored inside the module. You just need to pass cpu_offload = True in the dispatch_model. (not sure about the naming of the arg as it can be confusing)

Before this PR, all offloaded modules were placed on the meta device and the weights were either stored in a dict (cpu offload) or a mmap (disk offload). We would then move the modules to the execution device with their respective value taken from the dict/mmap during the forward.

For the user, this seems a little counter intuitive to put weights in a dict in the cpu offload case. Moreover, letting these weights on the modules should not degrade the performance during inference at all. Offloading created a number of issues about the parameters being on the meta device. While this does not completely solves issues related to meta device, this should cover most cases users don't use disk offload that much.

For now, the default value is False but I would like to make it the default behavior + extend it to Transformers. LMK if this make sense.

cc @mfuntowicz since you had an issue with offloaded model + quantization
cc @LysandreJik for visibility
Solves partially 1190

TODO:

Fix a memory leak somewhere

HuggingFaceDocBuilderDev · 2023-12-04T23:14:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

BenjaminBossan

Thanks for tackling this annoying issue. Overall, I understand too little about the mechanism used by accelerate to control this, so I'd leave that part of the review to others.

This PR adds the possibility to perform cpu offload with the weights stored inside the module.

Sorry for being dense, but where exactly is that happening?

not sure about the naming of the arg as it can be confusing

Yes, I'd definitely rename the argument, especially since already have a cpu_offload function in the same file.

BenjaminBossan · 2023-12-06T10:43:05Z

src/accelerate/hooks.py

@@ -468,6 +478,8 @@ def attach_align_device_hook_on_blocks(
        offload (`bool`, *optional*, defaults to `False`):
            Whether or not the weights should be offloaded after the forward pass. It can be one boolean for the whole
            module, or a dictionary mapping module name to boolean.
+        cpu_offload (`Union[bool, Dict[str, bool]]`, *optional*, defaults to `False`):
+            Whether the weights offloaded on the cpu should be kept in the module or not.


Docstring misses to explain what the option is for passing a dict here.

BenjaminBossan · 2023-12-06T11:02:34Z

tests/test_big_modeling.py

+            dispatch_model(model, device_map, offload_dir=tmp_dir, cpu_offload=True)
+
+            self.assertEqual(model.linear1.weight.device, torch.device("meta"))
+            self.assertEqual(model.batchnorm.weight.device, torch.device("cpu"))


The new behavior of getting "cpu" here instead of "meta" looks more intuitive to me.

Yes this is what we are aiming in this PR ! We want to let the module on cpu and not on meta device.

BenjaminBossan · 2023-12-06T11:08:54Z

src/accelerate/big_modeling.py

@@ -302,6 +302,7 @@ def dispatch_model(
    offload_dir: Optional[Union[str, os.PathLike]] = None,
    offload_index: Optional[Dict[str, str]] = None,
    offload_buffers: bool = False,
+    cpu_offload: bool = False,


In general, should newly added parameters be placed last in case someone calls this function with purely positional arguments?

BenjaminBossan · 2023-12-06T11:12:08Z

src/accelerate/hooks.py

@@ -230,12 +233,12 @@ def __init__(
    ):
        self.execution_device = execution_device
        self.offload = offload
+        self.cpu_offload = cpu_offload


Would it make sense to pass both offload and cpu_offload? It seems the former would take precedent over the latter. Maybe this could be checked or documented?

BenjaminBossan · 2023-12-06T11:12:57Z

src/accelerate/hooks.py

-
+        elif self.cpu_offload:
+            for name, _ in named_module_tensors(module, recurse=self.place_submodules):
+                set_module_tensor_to_device(module, name, "cpu")


Is special handling for Linear8bitLt required, similar to above?

github-actions · 2024-01-04T15:06:16Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

fxmarty · 2024-01-17T17:26:49Z

src/accelerate/big_modeling.py

@@ -302,6 +302,7 @@ def dispatch_model(
    offload_dir: Optional[Union[str, os.PathLike]] = None,
    offload_index: Optional[Dict[str, str]] = None,
    offload_buffers: bool = False,
+    cpu_offload: bool = False,


I think it can be confusing as CPU offloading is already indicated in the device_map.

IMO ideally there should not be any argument added, and by default the weights of modules offloaded on RAM should be on cpu device, not meta. it this is kind of a breaking change in case anybody is assuming that by default attached weights are on meta and weights_map holds the true weights.

SunMarc added 2 commits December 4, 2023 23:26

cpu_offload

32b3bf6

fix log

223a946

SunMarc requested review from BenjaminBossan and muellerzr December 4, 2023 23:11

fix

08e16f0

BenjaminBossan reviewed Dec 6, 2023

View reviewed changes

github-actions bot closed this Jan 13, 2024

SunMarc reopened this Jan 15, 2024

muellerzr added the wip Work in progress label Jan 16, 2024

fxmarty reviewed Jan 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable cpu offload with weights inside the module #2214

Enable cpu offload with weights inside the module #2214

SunMarc commented Dec 4, 2023

HuggingFaceDocBuilderDev commented Dec 4, 2023

BenjaminBossan left a comment

BenjaminBossan Dec 6, 2023

BenjaminBossan Dec 6, 2023

SunMarc Dec 6, 2023

BenjaminBossan Dec 6, 2023

BenjaminBossan Dec 6, 2023

BenjaminBossan Dec 6, 2023

github-actions bot commented Jan 4, 2024

fxmarty Jan 17, 2024

Enable cpu offload with weights inside the module #2214

Are you sure you want to change the base?

Enable cpu offload with weights inside the module #2214

Conversation

SunMarc commented Dec 4, 2023

What does this PR do ?

HuggingFaceDocBuilderDev commented Dec 4, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Dec 6, 2023

Choose a reason for hiding this comment

BenjaminBossan Dec 6, 2023

Choose a reason for hiding this comment

SunMarc Dec 6, 2023

Choose a reason for hiding this comment

BenjaminBossan Dec 6, 2023

Choose a reason for hiding this comment

BenjaminBossan Dec 6, 2023

Choose a reason for hiding this comment

BenjaminBossan Dec 6, 2023

Choose a reason for hiding this comment

github-actions bot commented Jan 4, 2024

fxmarty Jan 17, 2024

Choose a reason for hiding this comment