Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

IndexError: InlinedVector::at(size_type) const failed bounds check #786

Open
TonyTangYu opened this issue Nov 27, 2022 · 1 comment
Open

Comments

@TonyTangYu
Copy link

Please describe the bug
Hello Alpa team,
I tried the benchmark in my system with 8 GPUs. When I try the command 'python benchmark --suite gpt.grid_search_auto' , I run into the error shown in the figure.
I checked the printed information, this error happens in the compiling process of all stages when profiling for submesh (1, 4). There are no errors in the profiling process of submesh (1, 8).

System information and environment

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04, docker): Linux Ubuntu 16.04 with 8 GPUs.
  • Python version: 3.7.12
  • CUDA version: cuda 11.1
  • NCCL version: 2.8.4
  • cupy version: cupy-cuda111 11.0.0
  • GPU model and memory: A100 80GB
  • Alpa version: 1.0.0.dev0
  • TensorFlow version: 2.9.1
  • JAX version: 0.3.5

To Reproduce
Steps to reproduce the behavior:

  1. python gen_prof_database.py --max-comm-size-intra-node 32 --max-comm-size-inter-node 29
  2. python benchmark --suite gpt.grid_search_auto
  3. See error

Screenshots
截屏2022-11-26 19 14 23

Could you please help me out of it? Thanks a lot.

@caixiiaoyang
Copy link

请问您解决了吗,我也出现了这个错误

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants