Skip to content

How to Build Various Things

MatthewHambley edited this page Dec 19, 2022 · 14 revisions

While the LFRic build system is a useful start for understanding how to build Fortran programs and in particular all the special source generation used it is a working system with all the complication and imperfections involved. This page is an attempt to provide more concise directions. Hopefully it can be used in conjunction with the LFRic build system to aid understanding.

Examples are given in a pseudocode based on Makefile syntax. These will not work as makefiles but they should provide a concise summary.

Fortran

Recent bug fixes to the LFRic build system have sharpened our understanding of how to build Fortran source and how dependencies between them work.

The first thing to note is that common practice among Fortran compilers is to use file extension to indicate whether a source file needs preprocessing. Uppercase .F90 indicates it does, lowercase .f90 indicates it doesn't.

The only dependency implied by compilation is of source file to object file. This is a one-to-one transformation where one source file generates one object file.

A source file may contain one or more modules and the compiler will generate a module file for each of these. Such module files are named after the module they represent, not the source file from which they come. Although these module files are dependent on the source file used to generate them it is more useful to think of them as depending on the object file they are generated along side.

Where a Fortran source takes advantage of the use instruction to reference a module it is not uncommon to represent this as a dependency of one object file on another. Although this will work it is not strictly correct and leads to unnecessary re-building.

The purpose of the module file is to hold the API for the module it represents. It only changes if the API changes, as opposed to the object file which changes every time the implementation changes. Thus the correct dependency is of the object on used module files. That way they object is only rebuilt if the APIs it is calling change.

Sub-modules: Not What You Think They Are

One particular gotcha in building modern Fortran are sub-modules. Everyone, on first seeing them thinks "Ah-ha! A scoping trick to break up large module files" and they are wrong. Sub-modules exist purely as a work around for shoddy build systems.

When a module is rebuilt the .mod file should only be regenerated if the API of the module has changed. The build system may use this to understand that things for which the module is a prerequisite only need to be rebuilt if the API (module file) has changed. Otherwise it's just implementation changes which will be swept up at link time.

Unfortunately some less well implemented build systems use the source file of prerequisites to gate the rebuild. Thus every time the implementation changes there is a cascade of rebuilds back to the top level program.

Sub-modules fix this problem by moving the implementation out into a separate source file. In the place of the implementation a massive pile of interface blocks is added. With all this boilerplate the functional effect of sub-modules is to increase the amount of code which has to be maintained to no good end.

Unfortunately they are in the standard so we must know how to build them. They way to think of it is that program units can have the module as a prerequisite as normal but the sub-modules also have their parent module as a prerequisite. The parent module (really its .mod file) must be built before the submodules can be built. They must be rebuilt if the API expressed in the .mod file changes. Changes to the submodules do not imply a rebuild of the parent module.

Summary of Building Fortran

Building Fortran source is summarised by the following Makefile fragment:

# The compiler handles the preprocessing (or not) so the recipes are identical however Make needs two
# separate rules.
#
pp_fortran.o: pp_fortran.F90
    $(FC) $(FFLAGS) -o $@ $<

fortran.o: fortran.f90
    $(FC) $(FFLAGS) -o $@ $<

# Captures the dependency between object files and modules, i.e. which modules the source file references
# with a "use" statement.
#
pp_fortran.o: module_two.mod

# Instructs make on which modules come with which object files. i.e. If you need a module file these
# rules tell you which object file you need to compile.
#
module_one.mod: fortran.o
module_two.mod: fortran.o

# Submodules make everything worse.
#
interface_smod.o: interface_mod.mod
#
# But note that although the build dependency is module before submodule it is best to link them
# submodule before module.
#
smod_example: interface_smod.o interface_mod.o smod_example.o
    $(FC) $(LDFLAGS) $^ -o $@

LFRic Configurator

Model configuration is held in Fortran "namelist" files. These are generated using the Rose tool and read by the model. Rather than maintaining two copies of this metadata and requiring them to be kept in sync we chose to extend the Rose metadata format and generate the namelist loading source from it. This is the job of the "Configurator".

It takes a metadata file in and spits out a Fortran source for each namelist described therein. Also produced is a source file which orchestrates the loading of all the configuration and one which allows configuration to be faked for testing purposes. This is all done via a pair of intermediate file which is clumsy but necessary for licencing reasons.

A Makefile fragment to illustrate this:

# The configuration orchestration tool
#
configuration_mod.f90: thing_on_config_mod.f90 thing_two_config_mod.f90 ...
    GenerateLoader $@ %^

# Fake namelists for testing
#
feign_config_mod.f90: rose-meta.json
    GenerateFeigns rose-meta.json -output $@

# Namelist loader
#
thing_one_config_mod.f90 thing_two_config_mod.f90 ...: config-meta.json
    GenerateNamelist config-meta.json -directory $(OUTPUT_DIR)

# Intermediate files
#
config-meta.json config_namelists.txt: config-meta.conf
    rose_picker $< -directory $(OUTPUT_DIR) -include_dirs $(COMMON_META_DIR)

PSyclone

The heart of the LFRic project. PSyclone uses a set of instructions to mutates science code for performance. This involves adding OpenMP or OpenACC directives to give thread parallelism or GPU offload. It also involves interposing MPI to provide process parallelism. Plus many other potential optimisations, for instance cache-blocking of loops.

PSyclone is invoked for each algorithm source with a transformation script and produces a rewritten algorithm source, a PSy (Parallel System layer) source and potentially (unimplemented) rewritten kernel sources. This process depends on the source of the kernels invoked by the algorithm, on the transformation script and on the command-line arguments used at invocation. A change to any of these requires PSyclone to be rerun.

Note that while we are not rewriting kernels a change to the kernel implementation does not require a rebuild, only a change to the kernel metadata held in the same file. Once kernels are being rewritten then a change to any of the kernel source will require a rebuild.

flowchart TB
alg[Algorithm.x90] --> psyclone[Psyclone]
alg --> krn[Kernel.f90 - metadata only]
krn --> psyclone
Transformation.py --> psyclone
Arguments --> psyclone
psyclone --> RewrittenAlgorithm.f90
psyclone --> PSyLayer.f90
subgraph Future Requirement
optkern[PerAlgorithmOptimisedKernel.f90]
end
psyclone --> optkern
Loading

The reason that command-line arguments must be taken into account is that although the source may be transformed with a script the gross switch which turns off MPI usage is a command-line argument. Turning off MPI (or re-instating it) requires every file in the PSy layer to be rebuilt.

Make doesn't support command-line dependencies so they have been noted in comments:

KERNEL_PREREQS=grep _kernel_mod source/algorithm/semi_implicit_timestep_alg_mod.X90 | awk '{print $2}' | sed 's/,$//'
algorithm_psy.f90 algorithm_mutant.f90: algorithm.x90 $(addprefix $(KERNEL_DIR), KERNEL_PREREQS) transformation.py # command line arguments
    psyclone -api dynamo0.3 -l all -d $(KERNEL_DIR) -s transformation.py $(CLI_ARGS) -opsy algorithm_psy.f90 -oalg algorithm_mutant.f90 algorithm.x90

Note that the mechanism used here to determine kernel prerequisites is clunky and tailored to the way Make files work. In Fab we can do a much better job of identifying exactly which kernels are mentioned within call invoke(...) lines and looking them up in the usage statements.

Be aware that algorithms may not generate a PSy layer, if they contain no kernel invocations. In this case they will still create a rewritten algorithm layer as an effective "null operation."

There is a requirement to support a global transformation script which may be overridden by per-algorithm-file scripts.

It is also necessary for this automatic source generation process to be bypassed if a handwritten PSy source exists.

pFUnit

Our unit testing framework, pFUnit Is mostly a preprocessor stage. It takes a .pf source file and produces a .F90 source file.

In order to create the test driver a specially formatted include file must be generated which may be included by the driver source.

Pseudo Makefile:

unit_tests: driver.o $(TEST_OBJECTS)
  $(FC) $(LDFLAGS) -o $@ $^

driver.o: $(PFUNIT)/include/driver.F90 testSuites.inc
  $(FC) $(FFLAGS) -c -o $@ $<

testSuites.inc: $(LIST_OF_TESTS)
    echo ! Tests to run >$@
    for test in $(LIST_OF_TESTS); do echo ADD_TEST_SUITE($(test)_suite) >>$@

test.F90: test.pf
    pFUnitParser.py $< $@
Clone this wiki locally