NotImplementedError: Cannot copy out of meta tensor; no data! #26510

ari9dam · 2023-09-30T22:03:05Z

System Info

transformers==4.34.0.dev0
accelerate==0.23.0
torch==2.0.1
cuda==11.7

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import transformers
model = transformers.MistralForCausalLM.from_pretrained(model_path)

Error:
Traceback (most recent call last):
File "./trainer.py", line 198, in
train()
File "./trainer.py", line 152, in train
model = transformers.MistralForCausalLM.from_pretrained(
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3301, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3689, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers/modeling_utils.py", line 741, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Expected behavior

model loads sucessfully

The text was updated successfully, but these errors were encountered:

mdazfar2 · 2023-10-01T03:53:37Z

Hello sir can you assign the issue to me

ari9dam · 2023-10-01T04:07:54Z

It does not provide me the option to assign anyone. Sorry!

LysandreJik · 2023-10-02T08:55:56Z

@mdazfar2 feel free to open a PR and link it to this issue if you'd like to work on it!

ari9dam · 2023-10-03T01:36:26Z

It works without FSDP (i.e. with DDP)
with FSDP it is not working

mdazfar2 · 2023-10-03T05:07:50Z

@LysandreJik Yeah okk i will do it now

ari9dam · 2023-10-03T05:12:16Z

It works with DeepSpeed Stage2 as well. The error only occurs when using FSDP to train.

dannyhung1128 · 2023-10-04T03:20:40Z

Hit the same problem on slurm as well.

ari9dam · 2023-10-04T03:22:10Z

The same problem with llama2 as well. This is not specific for MistralAI.

…

On Tue, Oct 3, 2023 at 8:20 PM Danny Hung ***@***.***> wrote: Hit the same problem on slurm as well. — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/huggingface/transformers/issues/26510*issuecomment-1746070076__;Iw!!IKRxdwAv5BmarQ!eG_s4cCgluxzpX0_lLg3aeWu3YIyXfl7pkfljbtx-wD6p1HUWCpl3VSgIFyrFLI5RrYTkhfPWrfsTjzjmY5GjQY$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADL24YX5AMCMQR4GZXMFBELX5TIZHAVCNFSM6AAAAAA5N2KDIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWGA3TAMBXGY__;!!IKRxdwAv5BmarQ!eG_s4cCgluxzpX0_lLg3aeWu3YIyXfl7pkfljbtx-wD6p1HUWCpl3VSgIFyrFLI5RrYTkhfPWrfsTjzjpRchORM$> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ari9dam · 2023-10-04T19:42:05Z

What are possible reasons? I could run my code with 4.33.1. Is it accelerate?

LysandreJik · 2023-10-05T08:19:37Z

Maybe cc @muellerzr as well

amyeroberts · 2023-11-08T09:32:35Z

Gentle ping @muellerzr @pacman100

pacman100 · 2023-11-08T09:53:40Z

Hello, using the latest releases of transformers (4.35.0) and Accelerate (0.24.1), I am unable to reproduce the issue.

Code isssue_26510.py:

import transformers

model_path = "mistralai/Mistral-7B-Instruct-v0.1"
model = transformers.MistralForCausalLM.from_pretrained(model_path)

Accelerate config via accelerate config --config_file issue_26510.yaml:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch_policy: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: 1
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

launch command:

accelerate launch --config_file issue_26510.yaml issue_26510.py

output logs:

Downloading shards: 100%|█| 2/2 [00:00<00:00,  9.46
Downloading shards: 100%|█| 2/2 [00:00<00:00,  9.70
Downloading shards: 100%|█| 2/2 [00:00<00:00, 12.09
Downloading shards: 100%|█| 2/2 [00:00<00:00,  7.83
Loading checkpoint shards: 100%|█| 2/2 [00:12<00:00,  6.19s/it
Loading checkpoint shards: 100%|█| 2/2 [00:12<00:00,  6.22s/it
Loading checkpoint shards: 100%|█| 2/2 [00:12<00:00,  6.15s/it
Loading checkpoint shards: 100%|█| 2/2 [00:12<00:00,  6.12s/it

This was experienced initially due to the support for RAM efficient loading of pretrained models not being compatible with few models like Whisper. Therefore, the PRs Make fsdp ram efficient loading optional #26631 and Make fsdp ram efficient loading optional accelerate#2037 added a config parameter to make it optional. See the config param `` and set it to False in case RAM efficient loading of the model fails. The docs for this config parameter are given in https://huggingface.co/docs/accelerate/usage_guides/fsdp#how-it-works-out-of-the-box. The point to note is reshared below:

CPU RAM Efficient Model loading: If True, only the first process loads the pretrained model checkoint while all other processes have empty weights. Only applicable for 🤗 Transformers models. This should be set to False if you experience errors when loading the pretrained 🤗 Transformers model via from_pretrained method. When using this, Sync Module States needs to be True else all the processes expect the main process would have random empty weights leading to unexpected behaviour during training.

github-actions · 2023-12-03T08:05:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

kwonmha · 2023-12-11T06:32:35Z

The missing name of config parameter `` in the comment above is fsdp_cpu_ram_efficient_loading. :)

github-actions · 2024-01-04T08:06:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Nov 8, 2023

amyeroberts reopened this Nov 8, 2023

huggingface deleted a comment from github-actions bot Nov 8, 2023

pacman100 added the solved label Nov 8, 2023

github-actions bot closed this as completed Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NotImplementedError: Cannot copy out of meta tensor; no data! #26510

NotImplementedError: Cannot copy out of meta tensor; no data! #26510

ari9dam commented Sep 30, 2023 •

edited

Loading

mdazfar2 commented Oct 1, 2023

ari9dam commented Oct 1, 2023

LysandreJik commented Oct 2, 2023

ari9dam commented Oct 3, 2023

mdazfar2 commented Oct 3, 2023

ari9dam commented Oct 3, 2023

dannyhung1128 commented Oct 4, 2023

ari9dam commented Oct 4, 2023 via email

ari9dam commented Oct 4, 2023 •

edited

Loading

LysandreJik commented Oct 5, 2023

amyeroberts commented Nov 8, 2023

pacman100 commented Nov 8, 2023 •

edited

Loading

github-actions bot commented Dec 3, 2023

kwonmha commented Dec 11, 2023 •

edited

Loading

github-actions bot commented Jan 4, 2024

NotImplementedError: Cannot copy out of meta tensor; no data! #26510

NotImplementedError: Cannot copy out of meta tensor; no data! #26510

Comments

ari9dam commented Sep 30, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

mdazfar2 commented Oct 1, 2023

ari9dam commented Oct 1, 2023

LysandreJik commented Oct 2, 2023

ari9dam commented Oct 3, 2023

mdazfar2 commented Oct 3, 2023

ari9dam commented Oct 3, 2023

dannyhung1128 commented Oct 4, 2023

ari9dam commented Oct 4, 2023 via email

ari9dam commented Oct 4, 2023 • edited Loading

LysandreJik commented Oct 5, 2023

amyeroberts commented Nov 8, 2023

pacman100 commented Nov 8, 2023 • edited Loading

github-actions bot commented Dec 3, 2023

kwonmha commented Dec 11, 2023 • edited Loading

github-actions bot commented Jan 4, 2024

ari9dam commented Sep 30, 2023 •

edited

Loading

ari9dam commented Oct 4, 2023 •

edited

Loading

pacman100 commented Nov 8, 2023 •

edited

Loading

kwonmha commented Dec 11, 2023 •

edited

Loading