You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many observed cases of dsync getting stuck. At some point during the transfer it stops writing its periodic status update and there is essentially no IO going on for the file systems that the sync is being done on. Also, all but one of the dsync processes are near 100% cpu usage and stay there. It can stay stuck for days and has to be killed manually using the job manager.
This happens during syncs from one lustre file system to another.
v0.11.1
this is running on TOSS 4, which is based on RHEL using the 4.18.0-553.22.1 kernel.
I'm not sure which mpi it was compiled with
The lustre version being used is lustre-2.12.9_11.llnl for the clients, routers, and severs. However there will be a 2.15 version of lustre on the clients and routers soon.
The text was updated successfully, but these errors were encountered:
There are many observed cases of dsync getting stuck. At some point during the transfer it stops writing its periodic status update and there is essentially no IO going on for the file systems that the sync is being done on. Also, all but one of the dsync processes are near 100% cpu usage and stay there. It can stay stuck for days and has to be killed manually using the job manager.
This happens during syncs from one lustre file system to another.
v0.11.1
this is running on TOSS 4, which is based on RHEL using the 4.18.0-553.22.1 kernel.
I'm not sure which mpi it was compiled with
The lustre version being used is lustre-2.12.9_11.llnl for the clients, routers, and severs. However there will be a 2.15 version of lustre on the clients and routers soon.
The text was updated successfully, but these errors were encountered: