microsoft / Megatron-DeepSpeed Public

forked from NVIDIA/Megatron-LM

Notifications You must be signed in to change notification settings
Fork 344
Star 1.9k

Code
Issues 126
Pull requests 19
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: microsoft/Megatron-DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

126 Open 56 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Model conversion problem

#449 opened Sep 26, 2024 by yuanzhiyong1999

Async allreduce for tensor-parallel

#447 opened Sep 23, 2024 by drcanchi

[TRACKER] Customer support related PR tracker for Intel devices

#446 opened Sep 20, 2024 by delock

7 of 12 tasks

how to calcuate the training throughput

#444 opened Sep 12, 2024 by bigtree2020

llama3 and llama3.1 support

#443 opened Sep 10, 2024 by fmiao2372

[Bug] Missing weight gradients from LinearWithGradAccumulationAndAsyncCommunication when Zero Bubble Pipeline Parallelism Is disabled

#442 opened Sep 1, 2024 by mksit

Optimizer problem when using finetune_llama.sh

#440 opened Aug 28, 2024 by Kaiizx

zero3 The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.

#439 opened Aug 26, 2024 by ArtificialZeng

Why pretrain_llama_distributed.sh use pretrain_gpt.py ?

#437 opened Aug 22, 2024 by BrucePeng92

[bug]: ipex install breaks non xpu devices

#435 opened Aug 8, 2024 by saforem2

A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled.

#430 opened Jul 25, 2024 by LLMChild

How to resume training between GPTModel() checkpoint and GPTModelPipe() checkpoint?

#405 opened Jun 27, 2024 by tiggerwu

MOE TFLOPS calculation

#398 opened Jun 5, 2024 by yingzhao27

why moe can not use zero3

#397 opened Jun 4, 2024 by kuangdao

Inquiry on Sequence Parallel Support for VocabParallelEmbedding

#389 opened May 18, 2024 by qinxiangyujiayou

about the optimizer param group

#387 opened May 17, 2024 by L-hongbin

屎山代码DeepSpeed

#386 opened May 11, 2024 by ControllableGeneration

Sequence Parallel is incompatible with Rotary Positional Embedding

#385 opened May 9, 2024 by anogkongda

Spurious all gather performance drop.

#384 opened Apr 29, 2024 by etiennemlb

Call for Conversion from Huggingface to Megads with MoE

#381 opened Apr 24, 2024 by ControllableGeneration

Expert deepcopy raises PickleError

#380 opened Apr 23, 2024 by sxontheway

AttributeError: 'Namespace' object has no attribute 'deepspeed_config_dict'. Did you mean: 'deepspeed_config'? && batch = next(self.data_iterator)

#379 opened Apr 20, 2024 by hi20240217

Assertion failure when there are more than 255 tokenized data files (assert num_datasets < 255 in blendable_dataset.py)

#377 opened Apr 17, 2024 by Jeronymous

Pipeline parallelism + CPU offload?

#369 opened Mar 21, 2024 by webber26232

[BUG] Problems with Mixture-of-Experts (MoE)

#367 opened Mar 16, 2024 by nikit-srivastava

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly