Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.2 backport PR request list #6036

Open
zpcore opened this issue Dec 6, 2023 · 37 comments
Open

2.2 backport PR request list #6036

zpcore opened this issue Dec 6, 2023 · 37 comments

Comments

@zpcore
Copy link
Collaborator

zpcore commented Dec 6, 2023

The issue is to track 2.2 release backport.

For any PRs you want to backport to 2.2, please reply with following:

  • Original PR link
  • Reason to backport
  • 2.2 backport PR link
@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 6, 2023

Master PR: #5956
Status: Merged

Backport PR: #6055
Status: Merged

Reason:
AWS need this to pr to land to enable the coalesce reduce_scatter

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 6, 2023

Master PR: #5958
Status: Merged

Backport PR: #6037
Status: Merged

Reason:
Enhance the PT_XLA_DEBUG tool to truncate the python frame count.

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 7, 2023

Master PR: #6039
status: Merged

Backport PR: #6095
Status: Merged

Reason:
Update the trouble shooting doc with pt_xla_debug.

@jonb377
Copy link
Collaborator

jonb377 commented Dec 7, 2023

Master PR: #5803
Status: Merged

Backport PR: #6050
Status: Merged

Reason:
Persistent compilation caching - env hash

@jonb377
Copy link
Collaborator

jonb377 commented Dec 7, 2023

Master PR: #5804
Status: Merged

Backport PR: #6065
Status: Merged

Reason:
Persistent compilation caching - enable via environment

@jonb377
Copy link
Collaborator

jonb377 commented Dec 7, 2023

Master PR: #6046
Status: Merged

Backport PR: #6074
Status: Merged

Reason:
Persistent compilation caching - python API and docs

@jeffhataws
Copy link
Collaborator

jeffhataws commented Dec 7, 2023

Master PR: #5931
Status: Merged

Backport PR: #6054
Status: Merged

Reason:
AWS: Add a mesh_spmd for application to pass in topology, needed for pipeline parallel.

@jeffhataws
Copy link
Collaborator

jeffhataws commented Dec 7, 2023

Master PR: #5950
Status: Merged

Backport PR: 1271964 (got in before the branch cut)
Status: Merged

Reason:
AWS: enable all-gather coalescing

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 7, 2023

Master PR: #5922
status: merged

Backport PR: #6052
Status: merged

Reason:
fix some dynamo model failure

@will-cromar
Copy link
Collaborator

will-cromar commented Dec 8, 2023

Master PR: #6060
status: merged

Backport PR: #6066
status: merged

Reason: Fix transfers from eager GPU to XLA device

@ManfeiBai
Copy link
Collaborator

ManfeiBai commented Dec 8, 2023

Master PR: #6053
status: merged

Backport PR: #6079
status: merged

Reason: Hot-fix random duplicated trace registered TPU CI failure

@jeffhataws
Copy link
Collaborator

jeffhataws commented Dec 11, 2023

Master PR: #6059
status: Merged

Backport PR: #6108
status: Merged

Reason: Add a separate out-of-place all-gather. Add missing test for all-gather coalesce out, and fixed a bug for list size check in output!=None case.

@jeffhataws
Copy link
Collaborator

jeffhataws commented Dec 11, 2023

Master PR: #6058
status: Merged

Backport PR: #6109
Status: Merged

Reason: Add a separate out-of-place reduce-scatter and accompanying test.

@jonb377
Copy link
Collaborator

jonb377 commented Dec 11, 2023

Master PR: #6075
Status: Merged

Backport PR: #6091
Status: Merged

Reason: Eliminate flakiness in profiler tests

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 11, 2023

Master PR: #6071 (comment)
Status: Merged

Backport PR: #6094
Status: Merged

Reason: Fix transfers from eager GPU to XLA device for scalar tensor

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 13, 2023

Master PR: #6123
Status: Merged

Backport PR: #6129
Status: Merged

Reason: Bug fix for diagonal_scatter op.

@jonb377
Copy link
Collaborator

jonb377 commented Dec 13, 2023

Master PR: #6148
Status: Merged

Backport PR: #6167
Status: Merged

Reason: Bug fix for persistent cache

@jonb377
Copy link
Collaborator

jonb377 commented Dec 13, 2023

Master PR: #6116
Status: Merged

Backport PR: #6120
Status: Merged

Reason: Remove flaky test from CI

@jeffhataws
Copy link
Collaborator

jeffhataws commented Dec 14, 2023

Master PR: #5936
Status: Merged

Backport PR: #6237
Status: Merged

Reason: Fix a bug how zero-1 optimizer infer the local ranks with PP and DP.

@alanwaketan
Copy link
Collaborator

alanwaketan commented Dec 14, 2023

Master PR: #6097
Status: Merged

Backport PR: #6157
Status: Merged

Reason: Needed for v5e GA.

@alanwaketan
Copy link
Collaborator

alanwaketan commented Dec 14, 2023

Master PR: #6101
Status: Merged

Backport PR: #6158
Status: Merged

Reason: Needed for FSDPv2.

@qihqi
Copy link
Collaborator

qihqi commented Dec 14, 2023

Master PR: #6111
Status: Merged

Backport PR: #6164
Status: Merged

Reason: upstream changed torch.export;

@qihqi
Copy link
Collaborator

qihqi commented Dec 15, 2023

Master Pr: #6103
Status: Merged

Backport PR: #6172
Status: Merged

Reason: bug fix: currently computes totally wrong numbers for pow of int tensor and float exponent.

@wonjoolee95
Copy link
Collaborator

wonjoolee95 commented Dec 15, 2023

Master PR: #6110
Status: Merged

Backport PR: #6173
Status: Merged

Reason: Addresses behavior gap -- makes torch_xla's isnan op behavior consistent with torch's isnan

@ManfeiBai
Copy link
Collaborator

ManfeiBai commented Dec 15, 2023

Master PR: #6166
Status: Merged

Backport PR: #6329
Status: Merged

Reason: promote int to float for tanh operation (consistent with Pytorch)

@jonb377
Copy link
Collaborator

jonb377 commented Dec 15, 2023

Master PR: #6168
Status: Merged

Backport PR: #6204
Status: Merged

Reason: Add default XLA flags for perf on v5

@qihqi
Copy link
Collaborator

qihqi commented Dec 15, 2023

Master PR: #6056
Status: Merged

Backport PR: #6185
Status: Merged

Reason: this one removes excesive log message.

@alanwaketan
Copy link
Collaborator

alanwaketan commented Dec 15, 2023

Master PR: #6122
Status: Merged

Backport PR: #6187
Status: Merged

Reason: This one introduces fsdpv2.

@zpcore
Copy link
Collaborator Author

zpcore commented Dec 21, 2023

Master PR: #6221
Status: Merged

Backport PR: #6228
Status: Merged

Reason: Fix the xl-ml test expecttest issue.

@ManfeiBai
Copy link
Collaborator

ManfeiBai commented Dec 26, 2023

Master PR: #6224
Status: Merged

Backport PR: #6238
Status: Merged

Reason: Promote int to float for acos operation (consistent with Pytorch

@jeffhataws
Copy link
Collaborator

jeffhataws commented Jan 10, 2024

Master PR: #6247, #6268
Status: Merged and Merged

Backport PR: #6278
Status: Merged

Reason: Bug fix for segfault when running test_zero1 on GPU (#6260)

(would you mind help to verify the modification of this comment? @jeffhataws, thanks!)

@ManfeiBai
Copy link
Collaborator

Mater PR: #6300
Status: Merged

Backport PR: #6304
Status: Merged

Reason: Correct order of constant vs args

@ManfeiBai
Copy link
Collaborator

ManfeiBai commented Jan 19, 2024

Master PR: #6263
Status: Merged

Backport PR: #6329
Status: Merged

Reason: Fix #6252 for local test

@yeounoh
Copy link
Contributor

yeounoh commented Jan 19, 2024

Master PR: #6326
Status: merged

Backport PR: #6330
Status: merged

Reason: Fix #6319 (regression)

@vanbasten23
Copy link
Collaborator

vanbasten23 commented Jan 19, 2024

Master PR: #6321
Status: merged

Backport PR: #6334
Status: merged

Reason: Fix regression (see #6320 for detail)

@jonb377
Copy link
Collaborator

jonb377 commented Jan 25, 2024

Master PR: #6356
Status: merged

Backport PR: #6378
Status: merged

Reason: Update documentation for CheckpointManager

@alanwaketan
Copy link
Collaborator

alanwaketan commented Jan 29, 2024

Master PR: #6386
Status: merged

Backport PR: #6408
Status: merged

Reason: Update documentation for FSDPv2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests