Issue with QE v6.3 with and without openmp #10

rolly-ng · 2018-09-11T18:26:07Z

Hi,
I have parallel studio 2017 update 7 and I have successfully compiled ELPA 2017.11.001 then QE v6.3 via the configure-xxx-hsw.sh script.
It is okay when I try to run QE v6.3 on a single node in the cluster, i.e. srun -p ABC -N 1 -n 176 pw.x < my.in > my.out.
However, once I try over 2 nodes, i.e. srun -p ABC -N 2 -n 352 pw.x < my.in > my.out, it produces the strange "Error in routine cdiaghg problems computing cholesky" error.
If I compile ELPA and QE with configure-xxx-hsw-omp.sh script, it is also okay for single node. However, if 2 nodes, it produces "PMPI_Group_incl: Invalid rank, error stack:" message in the slurm-xxx.out
Could you please have a look at QE v6.3?

Moreover, conventional compilation without xconfigure run okay across multi nodes, i.e. ./configure CC=icc CXX=icpc F77=ifort F90=ifort MPIF90=mpiifort --enable-shared --enable-parallel --disable-openmp --with-scalapack=intel CFLAGS="-O3 -I -xCORE-AVX2" CXXFLAGS="-O3 -I -xCORE-AVX2" FCFLAGS="-O3 -I -xCORE-AVX2" F90FLAGS="-O3 -I -xCORE-AVX2" FFLAGS="-O3 -I -xCORE-AVX2

Thanks,
Rolly

hfp · 2018-09-25T13:05:48Z

Thank you for the report! At a first look, this looks like a problem only occurring when ELPA is incorporated. I may step back from ELPA as a default with Xconfigure, or find a version that works again.

rolly-ng · 2018-09-26T04:25:01Z

Hi Hans,
I have done some further tests and found that the -D__NON_BLOCKING_SCATTER in QE make.inc creates the problem.
I have compiled ELPA as instructed, then remove this parameter in QE make.inc. The v6.3 runs, but I have to make use of pw.x -nk 2 to speed up the parallel speed. Otherwise, 2 nodes runs slower then 1 node on the AUSURF112 benchmark.
Not sure if -nk 2 can fix the problem?
Thanks,
Rolly

hfp self-assigned this Sep 23, 2018

hfp added a commit that referenced this issue Sep 27, 2018

Issue #10: removed -D__NON_BLOCKING_SCATTER.

1579710

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with QE v6.3 with and without openmp #10

Issue with QE v6.3 with and without openmp #10

rolly-ng commented Sep 11, 2018 •

edited

Loading

hfp commented Sep 25, 2018

rolly-ng commented Sep 26, 2018

Issue with QE v6.3 with and without openmp #10

Issue with QE v6.3 with and without openmp #10

Comments

rolly-ng commented Sep 11, 2018 • edited Loading

hfp commented Sep 25, 2018

rolly-ng commented Sep 26, 2018

rolly-ng commented Sep 11, 2018 •

edited

Loading