有人 train 成功了吗？ #19

LYCnight · 2024-08-28T12:42:22Z

System Info / 系統信息

Transformer 4.43, 4.44, 4.33 都试了，modeling_chatglm.py 也替换了，运行最后的 .sh 文件是报了和其他人类似的错。
建议官方再把训练操作过程写的详细些。

Who can help? / 谁可以帮助到您？

。

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Loading extension module cpu_adam...
Time to load cpu_adam op: 2.735379934310913 seconds
Traceback (most recent call last):
File "/root/AI4E/ljc/LongWriter/train/main.py", line 130, in
train()
File "/root/AI4E/ljc/LongWriter/train/main.py", line 126, in train
trainer.train(resume_from_checkpoint=False)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
return inner_training_loop(
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/transformers/trainer.py", line 2095, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/accelerate/accelerator.py", line 1303, in prepare
result = self._prepare_deepspeed(*args)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/accelerate/accelerator.py", line 1779, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/deepspeed/init.py", line 179, in initialize
config_class = DeepSpeedConfig(config, mpu, mesh_device=mesh_device)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 797, in init
self._initialize_params(copy.copy(self._param_dict))
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 817, in _initialize_params
self.zero_config = get_zero_config(param_dict)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config
return DeepSpeedZeroConfig(**zero_config_dict)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/deepspeed/runtime/config_utils.py", line 57, in init
super().init(**data)
File "/root/anaconda3/envs/glm-4-copy/lib/python3.10/site-packages/pydantic/main.py", line 193, in init
self.pydantic_validator.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
stage3_prefetch_bucket_size
Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=15099494.4, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/int_from_float
[2024-08-28 12:38:44,068] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282936
[2024-08-28 12:38:44,901] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282937
[2024-08-28 12:38:46,425] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282938
[2024-08-28 12:38:46,443] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282939
[2024-08-28 12:38:46,452] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282940
[2024-08-28 12:38:46,460] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282941
[2024-08-28 12:38:46,460] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282942
[2024-08-28 12:38:46,469] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 282943
[2024-08-28 12:38:46,478] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/glm-4-copy/bin/python', '-u', 'main.py', '--local_rank=7', '--model_name_or_path', '/root/AI4E/share/glm-4-9b', '--train_file', './data/glm4/longwriter', '--output_dir', './output/glm4/longwriter', '--num_train_epochs', '4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--save_strategy', 'steps', '--save_steps', '400', '--save_total_limit', '10', '--preprocessing_num_workers', '64', '--learning_rate', '1e-5', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_dir', './logs/', '--deepspeed', 'ds_config/stage3.json', '--bf16', '--gradient_checkpointing', '1', '--adam_beta1', '0.9', '--adam_beta2', '0.95', '--report_to', 'wandb', '--run_name', 'glm4_longwriter', '--logging_steps', '1', '--batch_method', 'pack', '--pack_loss'] exits with return code = 1

Expected behavior / 期待表现

。

bys0318 · 2024-08-28T12:52:49Z

在deepspeed config里将stage3_prefetch_bucket_size设为15099494试试呢？

LYCnight · 2024-08-29T07:37:18Z

在deepspeed config里将stage3_prefetch_bucket_size设为15099494试试呢？

可以，但是会报新错误：
RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

LYCnight · 2024-08-29T07:42:58Z

官方人员检查一下 tokenizer 吧

我已经把官方的方法都试过了，现在我的情况是：

transformers==4.33.0
pytorch==2.2.0
/patch/modeling_chatglm.py 已替换 /root/AI4E/share/glm-4-9b/modeling_chatglm.py
但是运行的时候会报一个 KeyError: '<|endoftext|>'，所以我认为是 tokenizer 的问题。

官方人员检查一下 tokenizer 吧
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
path = "/root/AI4E/share/glm-4-9b"
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)

`---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[6], line 2
1 path = "/root/AI4E/share/glm-4-9b"
----> 2 tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py:723, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
721 if os.path.isdir(pretrained_model_name_or_path):
722 tokenizer_class.register_for_auto_class()
--> 723 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
724 elif config_tokenizer_class is not None:
725 tokenizer_class = None

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1854, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, *init_inputs, **kwargs)
1851 else:
1852 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 1854 return cls._from_pretrained(
1855 resolved_vocab_files,
1856 pretrained_model_name_or_path,
1857 init_configuration,
1858 *init_inputs,
1859 token=token,
1860 cache_dir=cache_dir,
1861 local_files_only=local_files_only,
1862 _commit_hash=commit_hash,
1863 _is_local=is_local,
1864 **kwargs,
1865 )

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2090, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, *init_inputs, **kwargs)
2087 tokenizer.add_tokens(tokens, special_tokens=is_last_special)
2089 # Check all our special tokens are registered as "no split" token (we don't cut them) and are in the vocab
-> 2090 added_tokens = tokenizer.sanitize_special_tokens()
2091 if added_tokens:
2092 logger.warning_advice(
2093 "Special tokens have been added in the vocabulary, make sure the associated word embeddings are"
2094 " fine-tuned or trained."
2095 )

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:861, in SpecialTokensMixin.sanitize_special_tokens(self)
851 def sanitize_special_tokens(self) -> int:
852 """
853 Make sure that all the special tokens attributes of the tokenizer (tokenizer.mask_token,
854 tokenizer.cls_token, etc.) are in the vocabulary.
(...)
859 int: The number of tokens added in the vocabulary during the operation.
860 """
--> 861 return self.add_tokens(self.all_special_tokens_extended, special_tokens=True)

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1004, in SpecialTokensMixin.add_tokens(self, new_tokens, special_tokens)
1001 if not isinstance(new_tokens, (list, tuple)):
1002 new_tokens = [new_tokens]
-> 1004 return self._add_tokens(new_tokens, special_tokens=special_tokens)

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py:421, in PreTrainedTokenizer._add_tokens(self, new_tokens, special_tokens)
417 if not special_tokens and hasattr(self, "do_lower_case") and self.do_lower_case:
418 token = token.lower()
419 if (
420 token != self.unk_token
--> 421 and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token)
422 and token not in tokens_to_add
423 ):
424 tokens_to_add.append(token)
425 if self.verbose:

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py:582, in PreTrainedTokenizer.convert_tokens_to_ids(self, tokens)
579 return None
581 if isinstance(tokens, str):
--> 582 return self._convert_token_to_id_with_added_voc(tokens)
584 ids = []
585 for token in tokens:

File ~/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py:595, in PreTrainedTokenizer._convert_token_to_id_with_added_voc(self, token)
593 if token in self.added_tokens_encoder:
594 return self.added_tokens_encoder[token]
--> 595 return self._convert_token_to_id(token)

File ~/.cache/huggingface/modules/transformers_modules/glm-4-9b/tokenization_chatglm.py:96, in ChatGLM4Tokenizer._convert_token_to_id(self, token)
94 def _convert_token_to_id(self, token):
95 """ Converts a token (str) in an id using the vocab. """
---> 96 return self.mergeable_ranks[token]

KeyError: '<|endoftext|>'`

LYCnight · 2024-08-29T07:54:28Z

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

KeyError: '<|endoftext|>'
Using unk_token, but it is not set yet.
Traceback (most recent call last):
File "/root/AI4E/ljc/LongWriter/train/main.py", line 139, in
train()
File "/root/AI4E/ljc/LongWriter/train/main.py", line 121, in train
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2090, in _from_pretrained
added_tokens = tokenizer.sanitize_special_tokens()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 861, in sanitize_special_tokens
return self.add_tokens(self.all_special_tokens_extended, special_tokens=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1004, in add_tokens
return self._add_tokens(new_tokens, special_tokens=special_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 421, in _add_tokens
and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 582, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 595, in _convert_token_to_id_with_added_voc
return self._convert_token_to_id(token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/tokenization_chatglm.py", line 96, in _convert_token_to_id
return self.mergeable_ranks[token]
~~~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: '<|endoftext|>'
[2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528556
[2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528557
[2024-08-29 07:53:57,347] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528558
[2024-08-29 07:53:58,671] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528559
[2024-08-29 07:53:58,689] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528560
[2024-08-29 07:53:58,698] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528561
[2024-08-29 07:53:58,706] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528562
[2024-08-29 07:53:58,720] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528563
[2024-08-29 07:53:58,732] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/glm-4-copy/bin/python', '-u', 'main.py', '--local_rank=7', '--model_name_or_path', '/root/AI4E/share/glm-4-9b', '--train_file', './data/glm4/longwriter', '--output_dir', './output/glm4/longwriter', '--num_train_epochs', '4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--save_strategy', 'steps', '--save_steps', '400', '--save_total_limit', '10', '--preprocessing_num_workers', '64', '--learning_rate', '1e-5', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_dir', './logs/', '--deepspeed', 'ds_config/stage3.json', '--bf16', '--gradient_checkpointing', '1', '--adam_beta1', '0.9', '--adam_beta2', '0.95', '--report_to', 'wandb', '--run_name', 'glm4_longwriter', '--logging_steps', '1', '--batch_method', 'pack', '--pack_loss'] exits with return code = 1

badarrrr · 2024-08-29T08:02:56Z

在deepspeed config里将stage3_prefetch_bucket_size设为15099494试试呢？

可以，但是会报新错误： RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

我遇到了跟你一模一样的错误：
Traceback of TorchScript (most recent call last):
File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/modeling_chatglm.py", line 145, in apply_rotary_pos_emb
rope_cache = rope_cache[:sq]
xshaped = x.reshape(sq, -1, np, rot_dim // 2, 2)
rope_cache = rope_cache.view(sq, -1, 1, xshaped.size(3), 2)

x_out2 = torch.stack(
[
RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

bys0318 · 2024-08-29T08:50:38Z

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

KeyError: '<|endoftext|>' Using unk_token, but it is not set yet. Traceback (most recent call last): File "/root/AI4E/ljc/LongWriter/train/main.py", line 139, in train() File "/root/AI4E/ljc/LongWriter/train/main.py", line 121, in train tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2090, in _from_pretrained added_tokens = tokenizer.sanitize_special_tokens() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 861, in sanitize_special_tokens return self.add_tokens(self.all_special_tokens_extended, special_tokens=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1004, in add_tokens return self._add_tokens(new_tokens, special_tokens=special_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 421, in _add_tokens and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 582, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 595, in _convert_token_to_id_with_added_voc return self._convert_token_to_id(token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/tokenization_chatglm.py", line 96, in _convert_token_to_id return self.mergeable_ranks[token] ~~~~~~~~~~~~~~~~~~~~^^^^^^^ KeyError: '<|endoftext|>' [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528556 [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528557 [2024-08-29 07:53:57,347] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528558 [2024-08-29 07:53:58,671] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528559 [2024-08-29 07:53:58,689] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528560 [2024-08-29 07:53:58,698] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528561 [2024-08-29 07:53:58,706] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528562 [2024-08-29 07:53:58,720] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528563 [2024-08-29 07:53:58,732] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/glm-4-copy/bin/python', '-u', 'main.py', '--local_rank=7', '--model_name_or_path', '/root/AI4E/share/glm-4-9b', '--train_file', './data/glm4/longwriter', '--output_dir', './output/glm4/longwriter', '--num_train_epochs', '4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--save_strategy', 'steps', '--save_steps', '400', '--save_total_limit', '10', '--preprocessing_num_workers', '64', '--learning_rate', '1e-5', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_dir', './logs/', '--deepspeed', 'ds_config/stage3.json', '--bf16', '--gradient_checkpointing', '1', '--adam_beta1', '0.9', '--adam_beta2', '0.95', '--report_to', 'wandb', '--run_name', 'glm4_longwriter', '--logging_steps', '1', '--batch_method', 'pack', '--pack_loss'] exits with return code = 1

你好，请用LongWriter-glm4-9b的tokenizer代码，目前的训练代码没有支持最新版GLM-4-9b的tokenizer。

badarrrr · 2024-08-29T08:57:25Z

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

KeyError: '<|endoftext|>' Using unk_token, but it is not set yet. Traceback (most recent call last): File "/root/AI4E/ljc/LongWriter/train/main.py", line 139, in train() File "/root/AI4E/ljc/LongWriter/train/main.py", line 121, in train tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2090, in _from_pretrained added_tokens = tokenizer.sanitize_special_tokens() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 861, in sanitize_special_tokens return self.add_tokens(self.all_special_tokens_extended, special_tokens=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1004, in add_tokens return self._add_tokens(new_tokens, special_tokens=special_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 421, in _add_tokens and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 582, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 595, in _convert_token_to_id_with_added_voc return self._convert_token_to_id(token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/tokenization_chatglm.py", line 96, in _convert_token_to_id return self.mergeable_ranks[token] ~~~~~~~~~~~~~~~~~~~~^^^^^^^ KeyError: '<|endoftext|>' [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528556 [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528557 [2024-08-29 07:53:57,347] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528558 [2024-08-29 07:53:58,671] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528559 [2024-08-29 07:53:58,689] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528560 [2024-08-29 07:53:58,698] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528561 [2024-08-29 07:53:58,706] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528562 [2024-08-29 07:53:58,720] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528563 [2024-08-29 07:53:58,732] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/glm-4-copy/bin/python', '-u', 'main.py', '--local_rank=7', '--model_name_or_path', '/root/AI4E/share/glm-4-9b', '--train_file', './data/glm4/longwriter', '--output_dir', './output/glm4/longwriter', '--num_train_epochs', '4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--save_strategy', 'steps', '--save_steps', '400', '--save_total_limit', '10', '--preprocessing_num_workers', '64', '--learning_rate', '1e-5', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_dir', './logs/', '--deepspeed', 'ds_config/stage3.json', '--bf16', '--gradient_checkpointing', '1', '--adam_beta1', '0.9', '--adam_beta2', '0.95', '--report_to', 'wandb', '--run_name', 'glm4_longwriter', '--logging_steps', '1', '--batch_method', 'pack', '--pack_loss'] exits with return code = 1

你好，请用LongWriter-glm4-9b的tokenizer代码，目前的训练代码没有支持最新版GLM-4-9b的tokenizer。

RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288
这个和你说的是同一个问题吗

LYCnight · 2024-08-30T02:19:45Z

在deepspeed config里将stage3_prefetch_bucket_size设为15099494试试呢？

可以，但是会报新错误： RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

我遇到了跟你一模一样的错误： Traceback of TorchScript (most recent call last): File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/modeling_chatglm.py", line 145, in apply_rotary_pos_emb rope_cache = rope_cache[:sq] xshaped = x.reshape(sq, -1, np, rot_dim // 2, 2) rope_cache = rope_cache.view(sq, -1, 1, xshaped.size(3), 2)
x_out2 = torch.stack(
[
RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

现在我也报这个错了

LYCnight · 2024-08-30T02:41:47Z

现在会报两种类型的错误

系统环境：
- python==3.11.9
- transformers==4.33.0
- pytorch==2.2.0
- /glm-4-9b 目录下的 modeling_chatglm.py和 tokenization_chatglm.py 都已经替换

错误一：RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

在 /ds_config/stage3.json 中设置 "stage3_prefetch_bucket_size": 15099494,

这样的话会一直运行到出现wandb界面，但在开始训练的时候就会报错：

 ^^^^^^^^  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/trainer.py", line 2679, in training_step
^RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/modeling_chatglm.py", line 146, in apply_rotary_pos_emb
    rope_cache = rope_cache[:sq]
    xshaped = x.reshape(sq, -1, np, rot_dim // 2, 2)
    rope_cache = rope_cache.view(sq, -1, 1, xshaped.size(3), 2)
                 ~~~~~~~~~~~~~~~ <--- HERE
    x_out2 = torch.stack(
        [
RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

错误二：Input should be a valid integer, got a number with a fractional part

在 /ds_config/stage3.json 中设置 "stage3_prefetch_bucket_size": "auto",

这样设置并运行的话会在wandb出现之前就报错：

  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/deepspeed/runtime/config.py", line 817, in _initialize_params
    self.zero_config = get_zero_config(param_dict)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config
    return DeepSpeedZeroConfig(**zero_config_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/deepspeed/runtime/config_utils.py", line 57, in __init__
    super().__init__(**data)
  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
stage3_prefetch_bucket_size
  Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=15099494.4, input_type=float]
    For further information visit https://errors.pydantic.dev/2.8/v/int_from_floa

LYCnight · 2024-09-02T01:50:22Z

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

KeyError: '<|endoftext|>' Using unk_token, but it is not set yet. Traceback (most recent call last): File "/root/AI4E/ljc/LongWriter/train/main.py", line 139, in train() File "/root/AI4E/ljc/LongWriter/train/main.py", line 121, in train tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2090, in _from_pretrained added_tokens = tokenizer.sanitize_special_tokens() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 861, in sanitize_special_tokens return self.add_tokens(self.all_special_tokens_extended, special_tokens=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1004, in add_tokens return self._add_tokens(new_tokens, special_tokens=special_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 421, in _add_tokens and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 582, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 595, in _convert_token_to_id_with_added_voc return self._convert_token_to_id(token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/tokenization_chatglm.py", line 96, in _convert_token_to_id return self.mergeable_ranks[token] ~~~~~~~~~~~~~~~~~~~~^^^^^^^ KeyError: '<|endoftext|>' [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528556 [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528557 [2024-08-29 07:53:57,347] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528558 [2024-08-29 07:53:58,671] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528559 [2024-08-29 07:53:58,689] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528560 [2024-08-29 07:53:58,698] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528561 [2024-08-29 07:53:58,706] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528562 [2024-08-29 07:53:58,720] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528563 [2024-08-29 07:53:58,732] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/glm-4-copy/bin/python', '-u', 'main.py', '--local_rank=7', '--model_name_or_path', '/root/AI4E/share/glm-4-9b', '--train_file', './data/glm4/longwriter', '--output_dir', './output/glm4/longwriter', '--num_train_epochs', '4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--save_strategy', 'steps', '--save_steps', '400', '--save_total_limit', '10', '--preprocessing_num_workers', '64', '--learning_rate', '1e-5', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_dir', './logs/', '--deepspeed', 'ds_config/stage3.json', '--bf16', '--gradient_checkpointing', '1', '--adam_beta1', '0.9', '--adam_beta2', '0.95', '--report_to', 'wandb', '--run_name', 'glm4_longwriter', '--logging_steps', '1', '--batch_method', 'pack', '--pack_loss'] exits with return code = 1

你好，请用LongWriter-glm4-9b的tokenizer代码，目前的训练代码没有支持最新版GLM-4-9b的tokenizer。

你好，请问有solution了吗，还是想跑一下训练

bys0318 · 2024-09-03T11:42:28Z

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

KeyError: '<|endoftext|>' Using unk_token, but it is not set yet. Traceback (most recent call last): File "/root/AI4E/ljc/LongWriter/train/main.py", line 139, in train() File "/root/AI4E/ljc/LongWriter/train/main.py", line 121, in train tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 723, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2090, in _from_pretrained added_tokens = tokenizer.sanitize_special_tokens() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 861, in sanitize_special_tokens return self.add_tokens(self.all_special_tokens_extended, special_tokens=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1004, in add_tokens return self._add_tokens(new_tokens, special_tokens=special_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 421, in _add_tokens and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 582, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 595, in _convert_token_to_id_with_added_voc return self._convert_token_to_id(token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/tokenization_chatglm.py", line 96, in _convert_token_to_id return self.mergeable_ranks[token] ~~~~~~~~~~~~~~~~~~~~^^^^^^^ KeyError: '<|endoftext|>' [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528556 [2024-08-29 07:53:56,997] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528557 [2024-08-29 07:53:57,347] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528558 [2024-08-29 07:53:58,671] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528559 [2024-08-29 07:53:58,689] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528560 [2024-08-29 07:53:58,698] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528561 [2024-08-29 07:53:58,706] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528562 [2024-08-29 07:53:58,720] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 528563 [2024-08-29 07:53:58,732] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/glm-4-copy/bin/python', '-u', 'main.py', '--local_rank=7', '--model_name_or_path', '/root/AI4E/share/glm-4-9b', '--train_file', './data/glm4/longwriter', '--output_dir', './output/glm4/longwriter', '--num_train_epochs', '4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--save_strategy', 'steps', '--save_steps', '400', '--save_total_limit', '10', '--preprocessing_num_workers', '64', '--learning_rate', '1e-5', '--weight_decay', '0.1', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_dir', './logs/', '--deepspeed', 'ds_config/stage3.json', '--bf16', '--gradient_checkpointing', '1', '--adam_beta1', '0.9', '--adam_beta2', '0.95', '--report_to', 'wandb', '--run_name', 'glm4_longwriter', '--logging_steps', '1', '--batch_method', 'pack', '--pack_loss'] exits with return code = 1

你好，请用LongWriter-glm4-9b的tokenizer代码，目前的训练代码没有支持最新版GLM-4-9b的tokenizer。

你好，请问有solution了吗，还是想跑一下训练

你好，从报错信息看代码运行时用的还是glm-4-9b原本的tokenization_chatglm.py，并不是LongWriter-glm4-9b的tokenization_chatglm.py。请确认main.py里model和tokenizer载入时是否加了trust_remote_code=True。

bys0318 · 2024-09-03T11:43:39Z

现在会报两种类型的错误

系统环境：
- python==3.11.9
- transformers==4.33.0
- pytorch==2.2.0
- /glm-4-9b 目录下的 modeling_chatglm.py和 tokenization_chatglm.py 都已经替换

错误一：RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

在 /ds_config/stage3.json 中设置 "stage3_prefetch_bucket_size": 15099494,

这样的话会一直运行到出现wandb界面，但在开始训练的时候就会报错：

 ^^^^^^^^  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/transformers/trainer.py", line 2679, in training_step
^RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b/modeling_chatglm.py", line 146, in apply_rotary_pos_emb
    rope_cache = rope_cache[:sq]
    xshaped = x.reshape(sq, -1, np, rot_dim // 2, 2)
    rope_cache = rope_cache.view(sq, -1, 1, xshaped.size(3), 2)
                 ~~~~~~~~~~~~~~~ <--- HERE
    x_out2 = torch.stack(
        [
RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

错误二：Input should be a valid integer, got a number with a fractional part

在 /ds_config/stage3.json 中设置 "stage3_prefetch_bucket_size": "auto",

这样设置并运行的话会在wandb出现之前就报错：

  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/deepspeed/runtime/config.py", line 817, in _initialize_params
    self.zero_config = get_zero_config(param_dict)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config
    return DeepSpeedZeroConfig(**zero_config_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/deepspeed/runtime/config_utils.py", line 57, in __init__
    super().__init__(**data)
  File "/root/anaconda3/envs/glm-4-copy/lib/python3.11/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
stage3_prefetch_bucket_size
  Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=15099494.4, input_type=float]
    For further information visit https://errors.pydantic.dev/2.8/v/int_from_floa

对于错误二，请把"stage3_prefetch_bucket_size": "auto"改为15099494。

bys0318 · 2024-09-03T11:47:31Z

从 hiyouga/LLaMA-Factory#5252 这个issue来看，"stage3_prefetch_bucket_size": "auto"报错可以通过降低deepspeed版本解决，试试pip install deepspeed==0.14.4。

bys0318 · 2024-09-03T15:58:59Z

@LYCnight @badarrrr 请看我们在README中的FAQ是否能解决你们遇到的问题。不好意思让你们久等了。

LYCnight · 2024-09-04T02:59:40Z

@LYCnight @badarrrr 请看我们在README中的FAQ是否能解决你们遇到的问题。不好意思让你们久等了。

非常感谢！我已经train成功了，分享一些经验：#25

LYCnight closed this as completed Aug 29, 2024

LYCnight reopened this Aug 29, 2024

bys0318 self-assigned this Aug 29, 2024

bys0318 added the bug Something isn't working label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

有人 train 成功了吗？ #19

有人 train 成功了吗？ #19

LYCnight commented Aug 28, 2024

bys0318 commented Aug 28, 2024

LYCnight commented Aug 29, 2024 •

edited

Loading

LYCnight commented Aug 29, 2024 •

edited

Loading

LYCnight commented Aug 29, 2024

badarrrr commented Aug 29, 2024

bys0318 commented Aug 29, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

badarrrr commented Aug 29, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

LYCnight commented Aug 30, 2024

LYCnight commented Aug 30, 2024

LYCnight commented Sep 2, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

bys0318 commented Sep 3, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

bys0318 commented Sep 3, 2024 •

edited

Loading

现在会报两种类型的错误

错误一：RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

错误二：Input should be a valid integer, got a number with a fractional part

bys0318 commented Sep 3, 2024 •

edited

Loading

bys0318 commented Sep 3, 2024 •

edited

Loading

LYCnight commented Sep 4, 2024

有人 train 成功了吗？ #19

有人 train 成功了吗？ #19

Comments

LYCnight commented Aug 28, 2024

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

bys0318 commented Aug 28, 2024

LYCnight commented Aug 29, 2024 • edited Loading

LYCnight commented Aug 29, 2024 • edited Loading

官方人员检查一下 tokenizer 吧

LYCnight commented Aug 29, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

badarrrr commented Aug 29, 2024

bys0318 commented Aug 29, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

badarrrr commented Aug 29, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

LYCnight commented Aug 30, 2024

LYCnight commented Aug 30, 2024

现在会报两种类型的错误

错误一：RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

错误二：Input should be a valid integer, got a number with a fractional part

LYCnight commented Sep 2, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

bys0318 commented Sep 3, 2024

附上运行 ' ./scripts/glm4_longwriter.sh' 时的报错信息：

bys0318 commented Sep 3, 2024 • edited Loading

现在会报两种类型的错误

错误一：RuntimeError: shape '[32768, -1, 1, 32, 2]' is invalid for input of size 524288

错误二：Input should be a valid integer, got a number with a fractional part

bys0318 commented Sep 3, 2024 • edited Loading

bys0318 commented Sep 3, 2024 • edited Loading

LYCnight commented Sep 4, 2024

LYCnight commented Aug 29, 2024 •

edited

Loading

LYCnight commented Aug 29, 2024 •

edited

Loading

bys0318 commented Sep 3, 2024 •

edited

Loading

bys0318 commented Sep 3, 2024 •

edited

Loading

bys0318 commented Sep 3, 2024 •

edited

Loading