Reduce memory by using `all_gather_into_tensor` #1968

muellerzr · 2023-09-13T18:21:41Z

What does this PR do?

torch 1.13 introduced publicly all_gather_into_tensor (before this was _all_gather_base) which is a much more memory efficient version of gather. One thing to note mentioned in the PR here is they did not have this be the base of gather, since it handles uneven inputs automatically. Since Accelerate does this separately and checks, we can safely use this API. Original DeepSpeed PR I discovered showing this

For a general idea of just how much memory can be stored, I ran a small simple test:

import time
import torch
from accelerate import PartialState
from accelerate.utils import gather

def convert_bytes(size):
    "Converts `size` from bytes to the largest possible unit"
    for x in ["bytes", "KB", "MB", "GB", "TB"]:
        if size < 1024.0:
            return f"{round(size, 2)} {x}"
        size /= 1024.0

    return f"{round(size, 2)} PB"

state = PartialState()
tensor = torch.rand((64, 224, 224, 64), device=state.device)

# Using `PartialState`
start_time = time.time()
tensor = gather(tensor)
end_time = time.time()

with state.main_process_first():
    print(f"Process {state.process_index} memory allocated after: {convert_bytes(torch.cuda.max_memory_allocated(state.device))}")
    print(f"Process {state.process_index} time: {end_time - start_time}")

The results can be summarized as such:

Before:
Total CUDA memory allocated: 3.83gb

After:
Total CUDA memory allocated: 2.3gb

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@BenjaminBossan @LysandreJik

HuggingFaceDocBuilderDev · 2023-09-13T18:26:39Z

The documentation is not available anymore as the PR was closed or merged.

BenjaminBossan

Nice improvement. Before proceeding, I have a few questions, please take a look. Also some minor suggestions, but those are not blockers.

src/accelerate/utils/operations.py

BenjaminBossan · 2023-09-18T13:03:44Z

src/accelerate/utils/operations.py

+        state = PartialState()
+
+        if state.backend is not None and state.backend != "gloo":
+            output_tensors = torch.zeros(


Why use torch.zeros instead of torch.empty_like as previously?

We're doing something slightly different here with the new API, where this gather works using a different tensor dim than before which is more efficient.

Added a comment

Not sure, but won't this allocate more memory than previously?

It's actually the opposite, hence what we're doing here.

src/accelerate/utils/operations.py

pacman100

Thank you @muellerzr for working on this, I have the same comments as Benjamin.

BenjaminBossan

In general looks good, thanks for adding support for this feature. I have only one question, but feel free to merge.

BenjaminBossan · 2023-10-02T18:09:46Z

src/accelerate/utils/operations.py

+        state = PartialState()
+
+        if state.backend is not None and state.backend != "gloo":
+            output_tensors = torch.zeros(


Not sure, but won't this allocate more memory than previously?

muellerzr added 4 commits September 13, 2023 17:53

all_gather_into_tensor

f1d4539

Cleanup

82a34ea

Reduce memory on non-gloo

d484988

Fin

fd35a7a

muellerzr added the enhancement New feature or request label Sep 13, 2023

muellerzr requested review from BenjaminBossan and LysandreJik September 13, 2023 18:21

Check for backend too on cpu

0a3bd1c

BenjaminBossan reviewed Sep 18, 2023

View reviewed changes

pacman100 reviewed Sep 29, 2023

View reviewed changes

muellerzr added 5 commits October 2, 2023 11:43

CPU comment

7a36532

Change scope for performance

1977e5c

Bring back zeros after remembering why

a425a89

Add comment

cd067f5

Add comment

5449149

muellerzr requested a review from BenjaminBossan October 2, 2023 16:14

BenjaminBossan approved these changes Oct 2, 2023

View reviewed changes

muellerzr requested a review from pacman100 October 2, 2023 18:15

muellerzr added 2 commits October 2, 2023 18:23

Use empty

5306fc1

Comment

cd70333

muellerzr merged commit 73640d0 into main Oct 10, 2023
26 checks passed

muellerzr deleted the gather-op branch October 10, 2023 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory by using `all_gather_into_tensor` #1968

Reduce memory by using `all_gather_into_tensor` #1968

muellerzr commented Sep 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 13, 2023 •

edited

Loading

BenjaminBossan left a comment

BenjaminBossan Sep 18, 2023

muellerzr Oct 2, 2023

muellerzr Oct 2, 2023

BenjaminBossan Oct 2, 2023

muellerzr Oct 2, 2023

pacman100 left a comment

BenjaminBossan left a comment

BenjaminBossan Oct 2, 2023

Reduce memory by using all_gather_into_tensor #1968

Reduce memory by using all_gather_into_tensor #1968

Conversation

muellerzr commented Sep 13, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 13, 2023 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Sep 18, 2023

Choose a reason for hiding this comment

muellerzr Oct 2, 2023

Choose a reason for hiding this comment

muellerzr Oct 2, 2023

Choose a reason for hiding this comment

BenjaminBossan Oct 2, 2023

Choose a reason for hiding this comment

muellerzr Oct 2, 2023

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Oct 2, 2023

Choose a reason for hiding this comment

Reduce memory by using `all_gather_into_tensor` #1968

Reduce memory by using `all_gather_into_tensor` #1968

muellerzr commented Sep 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 13, 2023 •

edited

Loading