-
Notifications
You must be signed in to change notification settings - Fork 2
FutureExascaleFortranBuildSystem
Paul Cresswell, David Davies, Matthew Hambley, Matt Shin and Stuart Whitehouse
Currently the majority of large Met Office scientific software projects use the inhouse FCM tool as a build system. This meets our needs at present, but there is concern that it will not meet those needs in the future.
Firstly, it does not support LFRic's code auto-generation (e.g. PSYclone). This requires two features FCM does not currently support: the ability to have multiple source files leading to multiple target files and calling arbitrary, non-compiler, tools.
Secondly, there is a support issue in that the code suffers from a "single point of expertise" problem., It is written in Perl and there are few developers and reviewers nowadays who are fluent in that language.
Generally in the organisation there is a push towards using external software, to reduce the maintenance overhead and avoid reinventing the wheel. In light of this we have investigated several off-the-shelf build systems, including SCons and CMake. The conclusions we have drawn are listed below. It is worth remembering that Fortran is a niche language in the wider software engineering world, and that to many people "Fortran" means Fortran 77, whereas our requirements include Fortran 2008 support.
There are risks in using external software (licensing, loss of control of development direction, etc), but in some cases the benefits outweigh the risks. The alternative option is to write our own custom software, effectively producing "FCM 2.0".
Note that FCM bundles up a number of capabilities in addition to build system. We do not intend to consider those in this review.
We have a large and varied set of requirements for a build system, some of which would be considered niche functionality by the average developer.
These have been split into requirements for a source extract system and for the build system.
The extract system collates a source tree from disparate repositories, archives, directories and other locations. This will include any required merging of the trees.
Meanwhile the build system performs actually builds the software. This may include pre-processing of the source followed by compilation, then linking.
- Extract source trees from multiple projects and manage their name spaces.
- The resulting directory tree to be fed to a build system downstream.
- The behaviour of the extract should be repeatable given the same configuration.
- Filter and re-root the project tree to obtain an extracted tree that contains only essential files.
- Extract code from version control systems including:
- Subversion.
- Git. (There is a requirement to extract tree-ish objects directly from remote Git clones. This is not currently handled by FCM but will become higher priority as time passes)
- Extract code from other locations:
- Local file system.
- Remote disks via e.g. SSH/RSYNC.
- A tarball. (Not currently handled by FCM.)
- Automatic merging of many source trees inside the same project.
- Install the extracted tree to one or more remote build location via e.g. SSH/RSYNC.
- Efficiently compile and link the program code into objects, libraries and
executables.
- Support modern Fortran and related languages.
- Source may be presented as one or more directory trees.
- Only compile the code you need to build the target.
- An automatic and global view of the source files in the source trees and
their name spaces.
- Automatic discovery of source files in source trees.
- Assign a unique hierarchical namespace for each source file.
- The ability to filter out unwanted source files. (Build system may not be used with the extract system, so source trees may need filtering.)
- Ability to create a hierarchy of static or dynamic object libraries, with or without dependent executables to link into them. (Not currently handled by FCM.)
- A task may have multiple inputs and multiple outputs. This is a key requirement for LFRic code auto-generation. (Not currently handled well by FCM.)
- Support multiple versions of the same target per source. (Different versions of targets in alternate locations? E.g. MPI versus non-MPI version, and static versus dynamic libraries.)
- Recognise compile time and link time dependency. E.g. include and module dependency hierarchy for compile, and object and library hierarchy for link.
- Report circular, missing and duplicate (same program unit symbol with 2+ implementations) dependency.
- Behaviour should be repeatable given identical inputs.
- Ability to easily customise build options globally, or by source file
hierarchical name space or target name.
- Compiler, pre-processor and linker.
- Options such as include locations, library locations, external libraries, debug and optimisation levels, switch on/off OpenMP, etc.
- Dependency analysis options. Dependency exclusion. Custom dependency.
- File extension, target selection, rename, etc.
- Support for Fortran (at least up to 2008), C and C++, including support for
hierarchical include files.
- Recognise classic top-level Fortran program units, such as program, subroutine, function, module.
- Submodule. (Handled but not proven by FCM.)
- Intelligent ISO-C binding support. (Not currently handled by FCM.)
- Integration with Psyclone and other Fortran tools.
- Support module-first build - as suggested by L.E. Busby, [1] a two-pass compilation (the first to generate module files, the second to produce object files) can greatly speed up the build.
- Pre-process before build dependency analysis, if pre-processing affects
dependency of build.
- Actual compile to be performed on original file instead of pre-processed file as not always easy to compile pre-processed output.
- Legacy support for interface files and link-time dependency directives.
- Support for running compiled targets as part of the build. (Not currently
handled by FCM)
- In particular, running unit testing frameworks e.g. pfUnit.
[1] | "A Note on Compiling Fortran" LLNL-TR-738243; L.E. Busby; Lawrence Livermore National Laboratory |
- Support user for fast development cycle.
- Parallelise expensive operations, e.g. download of source tree, dependency analysis, pre-processing, compile, etc. Intelligent or configurable ordering of tasks so that slow tasks are being done first.
- Incremental operation. Process only those things which have changed or
depend on changed things.
- Examples of things which should be handled in this way are: Configuration, Source tree and Known external items.
- Avoid using time stamp as a mechanism to detect change as it often gives false positives. E.g. A user may touch a file or edit-save-revert-save a file, updating the timestamp without updating the content. There is no need to recompile the file and targets depending on it.
- Manual override of incremental mode.
- Retrigger some tasks regardless of whether the system believes it is necessary. e.g. changing compiler version may not be visible to the system.
- Hierarchical inheritance of build artefacts from one or more locations (e.g. for prebuilds).
- Modularity - so different parts of the system can be used without using the whole thing. E.g. Ability to export the code dependency tree for e.g. graphical viewing.
- Pluggable system for future extension, e.g.:
- Different types of extract or source locations.
- Different source file types.
- Different tasks. E.g. A custom logic to run a new source generator or compiler.
- Programmable API as normal user interface, as well as a traditional
configuration file driven user interface.
- Use simple configuration file for simple configuration.
- Use API for more complex configuration.
- Archive and compress (e.g. TAR-GZIP) the results to save disk space.
- Support usage of temporary storage for temporary files. E.g. RAM disk.
- Easy for users to support building their projects for a large number of different sites, each which may have many platforms each in turn which may have multiple compilers.
- Documented user interface and API. User guide and tutorial.
- Good test coverage - the system should have its own test system to assist in development and maintenance.
- Should be easy to install (e.g. support for pip install) and deploy, and for external users to adapt. Should not require too many additional external dependencies to install.
- Our logic should be developed in a modern language such as Python 3 - something that will be understood by our user base.
- Our logic should be developed and maintained by a team to reduce chance of single point of expertise/failure - users will receive better support.
In addition, the following are considered strong nice-to-have features for a build system:
- The ability to generate a code browser would be nice.
- Support integration with IDEs.
- GUI support, e.g. using Jupyter notebook
A potential solution to integrating LFRic's source generators is to use fcm-extract to obtain and pre-process the source, then run the source generators, then use fcm-make to build the result. This has been prototyped and been proven to work using the existing fcm-make and Rose, however it was not implemented. This is technically out-of-scope of this activity as it merely makes use of existing tools but is included for completeness.
A hidden cost of this solution is that it still implies some higher-level build system unless the developer is expected to perform these stages by hand. As such it was not considered worth the effort to write a new build system to use FCM in this way which replicated the functionality already offered by the current LFRic build system. As it is fcm-extract has subsequently been integrated into that existing build system in order to obtain UM source. This is needed as we start to bring existing physics into the LFRic Atmosphere model.
This approach also has the same problem as extending fcm-make would have (see section Extend the current fcm-make below), namely that it does not tackle the Perl code issue. It does not achieve some of the additional requirements and nice-to-haves outlined above. Therefore, this could be seen as an unnecessary diversion of resources from a more long-term solution.
It is technically possible to upgrade the current fcm-make tool so that it supports LFRic's requirement for running code generators. However, in the medium- and long-term this is not a sustainable solution as it does not get around the single-point-of-expertise problem. Perpetuating software that already has a maintenance issue (there is very little Perl expertise in Science IT) is unwise and expanding the amount of Perl code is counter to our strategy to adopt Python for scripting. We believe this is an undesirable option.
There are both benefits and risks to using third-party software. If a good fit for our needs already exists, we can save much time and effort by not reinventing the wheel, and whilst we may require enhancements to existing functionality to meet our requirements we can feed these back to the software project itself for inclusion in the next release. There may still be the need for Met Office-specific extensions which either cannot be shared, or will not be accepted by the external project, which will require support and maintenance.
The major disadvantage of using external software is lack of strategic control. This is a risk which must be borne in mind for software which requires operational support. The support burden may increase if the software develops in a direction which no longer meets our requirements, and we may have to take over support entirely (or invest in different software) if development ceases.
The more we contribute to an Open Source project, however, the more influence we will have over the project direction, and there is a reputational benefit to the Met Office in doing so. It is unclear whether the overall resource requirement in using third-party software is lower, as what you gain in development time you may lose in causing extra work for users as the tool may be a less good fit for their needs.
We've investigated a number of off-the-shelf tools, as detailed below. Whilst there may be other tools out there, they aren't easily discoverable; the selection below are what we believe the prime candidates are.
SCons describes itself as a "next-generation build tool", and is intended to replace make, providing an easier, more reliable and faster way of building software. It does support Fortran; however, it has several issues.
Firstly, is it orientated towards a single platform and compiler combination; whilst it is possible to make it support multiple platforms and compilers by overriding environment variables, it is clearly not something it was designed to support. Secondly, it seems that Fortran isn't a primary language of the development community, who seem to concentrate on C and C++; in itself this wouldn't be a problem, but the documentation is rather lacking (and in some places plain wrong), so there would likely be a limited amount of support available.
Most critically however, the dependency analyser in SCons is not recursive. Our software has nested dependency trees many layers deep, but SCons can only deal with a single level of dependency. It would be possible to add this, but this seems to be a requirement we have that the SCons community does not.
We think that the effort we would need to invest to enhance SCons to meet our needs is almost as significant as rewriting our own build system from scratch, without any of the advantages in-house software provides (e.g. local support, influence on development).
CMake is not a build system, rather it generates Makefiles which can then be built using make. (Ninja may be used instead) It is a standard tool in Linux environments (though note our RHEL6 scientific desktop has a very old version installed) and has the advantage that it is something developers may be familiar with from previous jobs. It benefits from a large development community, and Fortran is well-supported. Building via CMake and make seems slightly slower than fcm-make [2]. Whether this is statistically significant would merit further investigation if we're seriously thinking about CMake.
The key issue with CMake is that it requires you to list all the files to build in a configuration file; it does no dependency analysis at this stage. This is a deliberate decision by the developers; this is to ensure that CMake knows when files are added or removed so it can re-invoke itself to rebuild the build system. CMake does do dependency analysis in that it compiles files in a correct order, but you cannot give it a target program and let it work out whether or not it needs to compile any given file, you must specify each file individually yourself. The dependency analysis also only analyses Fortran USE statements in the top-level program unit; it does not evaluate those in contained program units.
It would be possible to write a program to do this dependency analysis, but the analysis would need to be performed after any CPP preprocessing as the dependencies depend on the if-defs (especially in JULES). At this point you're writing software which pre-processes and does the dependency analysis, writes a configuration file for CMake, which in turn writes another configuration file which you then call make on. This seems a rather complicated way of doing things; it's using two off-the-shelf pieces of software, adding two of our own (extract plus pre-process and dependency analysis), and making them talk to each other (and making sure they continue to talk to each other as each package is updated, although make itself changes slowly).
CMake can go and locate libraries and tools itself, rather than them having to be specified in a configuration file or set by an environment variable; this can simplify maintenance, for example by removing hard-coded paths to a library when a new version is released. A lot of the power of CMake is in macros written in a hard-to-read scripting language, and the documentation is poor - changes can have unforeseen side-effects so using CMake generally requires a lot of trial-and-error. This could cause a large maintenance and support overhead.
[2] | An experiment building the UM found that it was about 10-20% slower than doing an identical compilation using fcm-make. More investigation would be required to understand why this happens; it may be because of Make calling itself recursively. Changing from Make to Ninja may speed up this process. |
ECMWF's build system ecbuild is essentially a wrapper script and macro library for CMake, so most of what was discussed above applies here too. In addition, it has site-specific assumptions in its logic, which may require some work to generalise.
This suffers a similar issue to CMake in that it requires a list of files to be specified, and Fortran feels like an afterthought.
This appears to be a framework for building build systems, rather than a build system itself. It might be worth investigating further if we decide to go ahead with implementing our own build system.
At the time of writing the source code for FoBiS.py was a mess and does not comply with PEP8. This doesn't give much confidence in the software, it could take a lot of effort to improve.
The key advantage of in-house software in this context is that it can be customised to meet our needs exactly. There's also the potential for responsive local support, and problems such as licensing are non-issues. The primary disadvantage is that there is an opportunity cost in assigning developers to write a build system rather than concentrating on other priorities, and the cost of ongoing support for this system, which is likely to be higher than maintaining a deployment of third-party software and just supporting Met Office extensions.
There is also the additional effort for partners in installing and configuring proprietary software, rather than relying on standard tools, however any of these solutions would require using Met Offices extensions to the base software anyway (though depending on the nature of these extensions, they could be submitted to the project for inclusion in their next release). It is worth noting that the current fcm-make is the best dependency analyser of Fortran we know of, and we rely on this heavily.
If rewriting fcm-make is the chosen option, it should be rewritten in Python rather than Perl, and having a defined API would be a strong nice-to-have, to allow additional customisation and flexibility by calling parts of the system from our own scripts. If we choose to develop our own in-house build system, this would become the third build system written by METOMI and we can draw on the experience and lessons learned from previous attempts. If this system were to be Open Source software, and we put adequate resources into publicising it, this could be good for the Met Office's reputation, and further reduce the maintenance overhead as other developers contributed to the project.
There is no option which requires no resource investment locally. The key decision is whether to invest resources in extending third-party software to develop incomplete and/or missing features, or whether this resource would be better spent writing our own solution which meets our needs exactly.
We strongly believe that extending the current fcm-make tool to add the additional functionality is not a sustainable solution in the long term. Whilst that may provide a build tool which meets the additional requirements of the LFRic project quickly, the long-term support of Perl code is a problem, and thus doing this does not get around the "single point of expertise" issue. There is no intention to remove the current version of fcm-make, nor change the version control aspects of FCM.
It is worth noting that none of the 3rd party options provide an extract system that can merge code, which we would have to write ourselves (or modify existing software, e.g. Rose's file-creation mode). We believe that separate extract and build tools are desirable; these tasks are conceptually separate. There should however be an agreed API or standardised file for the extract tool to pass information to the build tool, so the information can be plumbed through.
Given the analysis we've done, SCons could be ruled out as a viable candidate as it requires almost as significant amount of resource as writing our own build system from scratch. Using CMake as the basis for a new build system is a possibility, but it could potentially become unmaintainable as we try and keep four different pieces of software working together.
Our recommendation is to proceed with a rewrite of fcm-make in Python. This may take more up-front effort but will lead to a system which best meets our needs in the future and is likely to have a lower maintenance overhead.
- Future Release
- vn1.0 Release, March 2023
- 0.11 Beta Release, Jan 2023
- 0.10 Beta Release, Oct 2022
- 0.9 Alpha Release, June 2022
- Phase 2
- Phase 3
- Phase 4
- Repository Management
- Development Process
- Development Environment
- Releasing Fab
- Coding Conventions
- Glossary
- Concerning the Database
- Unit Test Coverage
- Issues With the System Testing Framework