Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial setup scripts #1

Open
tobigithub opened this issue Jun 7, 2021 · 1 comment
Open

Initial setup scripts #1

tobigithub opened this issue Jun 7, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@tobigithub
Copy link

tobigithub commented Jun 7, 2021

Hi,
congratulations on the initial release. Very exciting stuff to have everything automated.
The initial release just came out 3 days ago, so I guess this is all very fresh.
For users that are not familiar with the program it would be nice to have two additional scripts in the release.

  1. It would be nice to have an .envrc file and a source script to export the appropriate variables.
  2. It would be nice to setup the appropriate OMP variables for parallel execution.

For example without any restrictions, the program just uses all threads, instead of CPU cores.
In my case I have a complete over-saturation (I believe). So the OMP_NUM_THREADS could
be also in the setup script, but as number of cores, not threads.

.envrc (just an example from my environment, not universal)

export XTBHOME=/home/ubuntu/QCxMS.v5.0/.XTBPARAM
export PATH=$PATH:/home/ubuntu/QCxMS.v5.0
export OMP_NUM_THREADS=112
ulimit -s unlimited

and the command to source the .envrc (from my environment, not universal)

source /home/ubuntu/QCxMS.v5.0/.envrc

Also when the OMP_NUM_THREADS variable is set on a NUMA node (not multiple cluster nodes) the pqcxms calling option has to be set to one (pqcxms 1) otherwise there is an over subscription of threads.

Best
Tobias

image

@tobigithub
Copy link
Author

tobigithub commented Aug 19, 2021

Actually for small molecules or similar its better to just run export OMP_NUM_THREADS=2
I did the test on another machine with 48 true CPUs (96 threads) and for these, while always being exactly at the true CPU core count of 48 and export OMP_NUM_THREADS=2 was the fastest. Again just a micro benchmark, also it may vary when using other platforms. For PBS and SLURM or TORQ its probably similar, 2-4 OMP_NUM_THREADS will be the best.

export OMP_NUM_THREADS=1
time pqcxms 48
real    4m15.585s
user    44m26.680s
sys     0m42.070s

export OMP_NUM_THREADS=2
time pqcxms 24
real    2m57.483s
user    63m39.573s
sys     0m46.494s

export OMP_NUM_THREADS=4
time pqcxms 12
real    3m55.269s
user    253m8.834s
sys     5m6.752s


export OMP_NUM_THREADS=8
time pqcxms 6
real    8m27.914s
user    358m16.833s
sys     2m13.541s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants