Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
============================== Release Notes: v0.102 ============================== Support for new training algorithms: - LTFB is now a first-class training algorithm. - LTFB now allows multiple metrics. The local algorithm is favored by each trainer and a partner model must win every metric to be declared the tournament winner. - The batched iterative optimizer (sgd_training_algorithm) was refactored for consistency. - Improved documentation of training algorithm infrastructure. Support for new network structures: - ATOM WAE model - character-based Wasserstein Autoencoder - Community GAN model for graph data sets Support for new layers: - "DFTAbs" layer that computes the absolute value of the channel-wise DFT of the input data - Adding support for 3D Matrix Multiplication - Added scatter and gather neural network layers - CPU-based GRU layers using oneDNN - Added batch-wise reduce-sum - ArcFace loss Python front-end: - Added 3D U-Net Model - Added Cosmoflow Model - Ported CANDLE Pilot1 models - Support nvprof - Added channelwise fully connected layer - Added support for non square kernels, padding, stride, and dilation for the convolution module - Support for OpenMPI launcher Performance optimizations: - Use cuDNN 8 RNN API and CUDA Graphs in GRU layer - Cache CUDA Graphs for each active mini-batch size - Tuned performance of slice, concatenate, and tessellate layers on ARM processors - Parallelize computation of Gaussian random numbers - Optimizing tessellate, concatenate, and slice layers on CPU Experiments & Applications: - Added experiment scripts for ATOM cWAE Gordon Bell simulations - LBANN-ATOM model inference and analysis Internal features: - Wrapper classes for CUDA Graphs API - Elementary examples of using complex numbers - cuDNN handles are now wrapped in RAII management classes - Improved HWLOC compatility for v1.11 and v2.x - Added an enum type of visitor hooks that will eventually be used to allow callbacks or other visitors to operate at user defined hook points - Changed checkpoint logic to checkpoint at the start of epochs and changed the naming scheme to use the callback phase (visitor hook) in the name rather than the current execution context. - Added in-memory binary model exchange for LTFB. - Added support for ROCm and MIOpen - Added support for oneDNN - Updated the bamboo test environment to use local executable rather than hard coded executables - Overhauled and refactored serialization throughout code to use Cereal serialization library - Significant cleanup and refactoring of code base to improve compile times. Moving to ensure that code adheres to standard split of header between declaration and implementation functions (for templated code). Specifically focused on serialization functions and comm class. Reduced dependencies through over reaching header inclusions. - The relationship of execution_contexts and training_algorithms was clarified. There is still work to do here. - Added DistConv tests both convolution and pooling layers - Support padding in distributed embedding layer - Added dump model graph callback - Added perturb learning rate callback - Added batched inference algorithm - Switched ATOM tests to use CPU embedding and tessellate layers to minimize noise I/O & data readers: - Experimental data reader that generates graph random walks with HavoqGT - Added explict tournament execution mode - Added support to split training data reader into validation and tournament readers - node2vec data reader Build system: - Hydrogen v1.5.0+ - Aluminum v0.5.0+ - DiHydrogen v0.2.0 is required - C++14 or newer standard with CUDA (CMake: "-DCMAKE_CUDA_STANDARD=14") - OpenCV is now an optional dependency via CMake "LBANN_WITH_VISION" - CNPY is now an optional dependency via CMake "LBANN_WITH_CNPY" - Adds support in the build_lbann.sh script for concretizing extra packages with the primary LBANN installation - New features in the build script to setup / configure the build environment, but stop and allow the user to manually add extra packages - Add a set of user-focused build scripts that use the main build_lbann.sh script to setup good defaults on known systems - Added application specific build scripts for users such as ATOM - Added support for pulling from Spack mirrors and setting them up - Split embedded Python support from Python Front End - Switched Spack-based build script to use Spack's clingo concretizer Bug fixes: - Fixed a bug where LBANN didn't set the Hydrogen RNG seed - Fixed both CosmoFlow and UNet models PFE as well as addressed issues in the data reader and data coordinator. - Fixed the HDF5 data reader to properly specify the supported I/O types - Fixed calculation of the linearized response size - Fixed the data coordinator's interface to input_layer - Fixed error with deterministic execution of dropout layers Retired features: - Removed deprecated JAG leader mode which was made obsolete when the data reader moved into the data coordinator - Removed the deprecated partitioned data reader modes that were used to partition and overlap data sets for multiple models - Removed deprecated ActivationDescriptor class
- Loading branch information