Able to run with "nrsmpi turbpipe 1" but not "nrsmpi turbpipe 2/4/8" #250
Replies: 3 comments 5 replies
-
Hi Tony, nrspre is a script calling nekrs in the build-only mode running on single processor. Therefore, at runtime, it will only need to load the kernels instead of compiling a lots of stuffs. I suggest you first test the code using single node in the interactive mode. For many node, you have to modify the the submit script. @aprilnovak does have a working in progress documentation for NekRS. I'm not sure if it's ready for public. Hope this helps, |
Beta Was this translation helpful? Give feedback.
-
I think this is related to a bug in UCX: #201 |
Beta Was this translation helpful? Give feedback.
-
I believe it was resolved by certain UCX settings on my machine. Granted,
we are running a supercomputer in Australia, so it may not be helpful for
you, but we include such settings in the slurm script:
export UCX_MEM_MMAP_RELOC=n
export UCX_MEM_MALLOC_HOOKS=n
export UCX_MEM_MALLOC_RELOC=n
export UCX_MEM_EVENTS=n
export UCX_MEMTYPE_CACHE=n
backend=CUDA
env | grep 'UCX\|OMPI' | sort
…On Sat, Feb 19, 2022 at 10:51 AM zongzilin ***@***.***> wrote:
Has this bug been fixed? I am having the similar problem on nekRS-21.1 as
[hpc217:264304] Read -1, expected 43016, errno = 14 [hpc217:264304] ***
Process received signal *** [hpc217:264304] Signal: Segmentation fault (11)
[hpc217:264304] Signal code: Invalid permissions (2) [hpc217:264304]
Failing at address: 0x2b30f1dbd800 [hpc217:264303] Read -1, expected 43016,
errno = 14 [hpc217:264303] *** Process received signal *** [hpc217:264303]
Signal: Segmentation fault (11) [hpc217:264303] Signal code: Invalid
permissions (2)
—
Reply to this email directly, view it on GitHub
<#250 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AS46F5277PLGRE2WMVOZ5K3U33LQPANCNFSM4ZDKWU6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I've managed to successfully run the turbPipe case for 1 CPU/GPU. It looks pretty fast.
However, I am now trying to run the case with 4 CPU/GPUs. I tend to execute the following commands in succession:
The first thing that I will note is that whilst nrspre runs succesfully, it reports that it is only using 1 MPI task. Despite the fact that I have provided it with 4 MPI ranks and 4 GPUs. It does state it's jit-compiling for > 4 MPI tasks so maybe this is by design. I just found it confusing given that on the github front page it tells you to run for example, nrspre ethier 2.. (This isn't the case with nrsmpi)
The issue I run into when running with nrsmpi turbPipe 2 :
The issue I run into when running with nrsmpi turbPipe 4:
The issue I run into when running with nrsmpi turbPipe 8:
Any help really appreciated,
Tony
Beta Was this translation helpful? Give feedback.
All reactions