microsoft / DeepSpeed Public

Notifications You must be signed in to change notification settings
Fork 4.1k
Star 35.7k

Code
Issues 978
Pull requests 129
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: microsoft/DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

978 Open 1,876 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[BUG] using deepspeed slower inference time bug

Something isn't working

inference

#6818 opened Dec 4, 2024 by williamlin0518

deepspeed installation problem

#6817 opened Dec 4, 2024 by sagie-dekel

[QST] MoE auxiliary loss

#6816 opened Dec 4, 2024 by osayamenja

[BUG] DeepSpeed accuracy issue for torch.compile if activation checkpoint function not compiler disabled bug

Something isn't working

training

#6811 opened Dec 1, 2024 by NirSonnenschein

[BUG] Enabling drop_tokens in MoE layer causes inference to hang bug

Something isn't working

inference

#6809 opened Nov 29, 2024 by Shamauk

[Questions] Why Ulysess need all2all for QKV, but RingAttention just need KV under context parallel ?

#6808 opened Nov 29, 2024 by elevenxiang

multiple runs on same machine, with ctrl+c, all runs are killed

#6807 opened Nov 29, 2024 by ysyyork

[BUG] Getting "SymIntArrayRef expected to contain only concrete integers" error when > 1 GPU bug

Something isn't working

training

#6806 opened Nov 28, 2024 by rileyhun

[BUG] deepspeed inference for llama3.1 70b for 2 node, each node with 2 gpu bug

Something isn't working

inference

#6805 opened Nov 28, 2024 by rastinrastinii

[BUG] Unnecessary memory copy in paramater partition in ZeRO3 bug

Something isn't working

training

#6804 opened Nov 28, 2024 by yingtongxiong

Use DS4Sci_EvoformerAttention and torch.util.checkpoint.checkpoint at the same time during training

#6802 opened Nov 28, 2024 by cbyzju

nv-torch-nightly-v100 CI test failure ci-failure

#6801 opened Nov 28, 2024 by github-actions bot

nv-ds-chat CI test failure ci-failure

#6800 opened Nov 28, 2024 by github-actions bot

deepspeed setup for requiring grads on the input (explainability) without huge increase in memory over all gpus

#6798 opened Nov 27, 2024 by GonyRosenman

Question about using Autotuner with ZeRO and tensor parallelism

#6796 opened Nov 27, 2024 by rlanday

Does ZeRO++ Work on AMD GPU Mi200?

#6795 opened Nov 27, 2024 by unavailableun

AssertionError: no sync context manager is incompatible with gradientpartitioning logic of ZeRo stage 3

#6793 opened Nov 26, 2024 by 66RomanReigns

nv-nightly CI test failure ci-failure

#6790 opened Nov 26, 2024 by github-actions bot

[REQUEST] Let ZeRO-offload use CPU and GPU parallelly enhancement

New feature or request

#6778 opened Nov 23, 2024 by fzyzcjy

[BUG] [Fix Suggestion] Uneven head sequence parallelism bug

Something isn't working

training

#6774 opened Nov 21, 2024 by Eugene29

Something isn't working

training

#6772 opened Nov 20, 2024 by traincheck-team

#6771 opened Nov 20, 2024 by traincheck-team

#6770 opened Nov 20, 2024 by traincheck-team

grad is None bug

Something isn't working

training

#6768 opened Nov 20, 2024 by suanflower

[BUG] clip_grad_norm for zero_optimization mode is not working bug

Something isn't working

training

#6767 opened Nov 20, 2024 by chengmengli06

Previous 1 2 3 4 5 … 39 40 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly