-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The DP distillation model ( model.pth by dp --pt freeze) encounters an MPI error when run with LAMMPS for a system of 32,000 atoms, but works well for a 2-atom system #4421
Comments
we do NOT accept chinese issues. Please translate properly in english. |
updated |
Could you post your log.lammps file? |
32000 atoms may trigger out-of-memory. |
The following files are log,lanmms and out_lmp. Regarding the assertion that '32,000 atoms may trigger out-of-memory' issues, I have not encountered any warnings or errors indicating such problems. In my experience, the V100_32 GPU can support nearly 90,000 atoms without encountering out-of-memory errors when using the TensorFlow backend model. |
My suggestion is to test on a smaller system to check the correctness of the algorithm. |
Bug summary
The model.pth obtained using the PyTorch backend of DeepMD-kit 3.0.0 encounters an MPI error when run with LAMMPS for a system of 32,000 atoms, but works well for a 2-atom system.
DeePMD-kit Version
3.0.0
Backend and its version
Pytorch
How did you download the software?
Offline packages
Input Files, Running Commands, Error Log, etc
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
Proc: [[2036,0],0]
Errorcode: 1
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
Steps to Reproduce
32000atoms_lmp.zip
Bohrium platform,
{
"job_name": "7alloy",
"command": "lmp -in in.zbl > out",
"log_file": "run_log",
"job_type": "container",
"backward_files": [],
"project_id": 190380,
"platform": "ali",
"disk_size": 200,
"machine_type": "1 * NVIDIA V100_32g",
"image_address": "registry.dp.tech/dptech/prod-12166/deepmd-kit-v3-dpgen2-zbl:v3"
}
Further Information, Files, and Links
2atoms works well example input:
LAMMPS input: in.zbl:
LAMMPS input script to calculate the energy of two atoms
units metal
dimension 3
boundary p p p
atom_style atomic
read_data 03.dat
mass 2 180.95 #Ta
mass 4 51.996 #Cr
mass 5 55.845 #Fe
mass 6 58.963 #Ni
mass 1 47.867 #Ti
mass 3 26.982 #Al
mass 7 58.933 #Co
###--------------------Force Field-------------------------------
pair_style deepmd model.pth
pair_coeff * * Ti Ta Al Cr Fe Ni Co
thermo 1
thermo_style custom step pe
run 0 #
variable energy equal pe
print "${energy}" append energies.txt
03.data
position data for Lammps generated by PYTHON
2 atoms
7 atom types
-15.800000 15.800000 xlo xhi
-15.800000 15.800000 ylo yhi
-15.800000 15.800000 zlo zhi
Atoms
1 2 0.000000 0.000000 0.000000
2 2 0.000000 0.000000 0.300000
job.json
{
"job_name": "7alloycompress",
"command": "lmp -in in.zbl > out",
"log_file": "run_log",
"job_type": "container",
"backward_files": [],
"project_id": 190380,
"platform": "ali",
"disk_size": 200,
"machine_type": "1 * NVIDIA V100_32g",
"image_address": "registry.dp.tech/dptech/prod-12166/deepmd-kit-v3-dpgen2-zbl:v3"
}
The text was updated successfully, but these errors were encountered: