-
Notifications
You must be signed in to change notification settings - Fork 23
OpenMP
The OpenMP directives allow HYCOM to run on multiple processors of shared memory machines. They can also be used in conjuction with MPI domain decomposition (MPI between multi-processor nodes, OpenMP on each node). This mode of parallelization is typically best for a relatively modest number of processors (2,3,4,6,8), although more can profitably be used on large grids.
The PARAMETER mxthrd has been added to mod_dimensions.F90 and dimensions.h. Each outer (i or j) loop is diviuded into mxthrd pieces by OpenMP. So edit mxthrd to be a multiple of NOMP (i.e. OMP_NUM_THREADS). It is often best to set mxthrd larger than NOMP, because that tends to give better land/sea load balance between threads. For example, mxthrd=16 could be used with 2, 4, 8 or 16 threads. Other good choices are 12, 24, 32 etcetera. Large values of mxthrd are only likely to be optimal for
Parallel sums are not performed via an OpenMP REDUCTION clause, because it is not bit for bit reproducable when run twice on the same data sets. Instead, row sums are done explicitly in parellel, followed by a serial sum of the row sums. This is probably slower than REDUCTION, but it is bit for bit reproducable on any number of processors. Please report any cases where HYCOM gives different answers for different values of OMP_NUM_THREADS (this is not supposed to happen).