Skip to content

Configuration file options

High-Dimensional Neurology Group, UCL edited this page Jun 15, 2022 · 29 revisions
Key Type Constraints Required Default Description
total_epochs int > 0 True 1 Total number of epochs to train model for
batch_size int > 0 True 1 Number of training points in each minibatch
nii_target_shape list[int] >0, power of 2, length == 3 True [128, 128, 128] Resolution of training images along each dimension
latents_per_channel list[int] >0, length == log2(resolution) + 1 True [log2(resolution) + 1, log2(resolution), log2(resolution) - 1, …, 1] MISNOMER! Should be ‘latent_feature_maps_per_resolution’. Number of latent feature maps at each resolution.
channels_per_latent list[int] >0, length == log2(resolution) + 1 True [20, ..., 20] Number of channels per latent feature map.
channels list[int] >0, length == log2(resolution) + 1 True [20, 40, 60, …, 20 x (log2(resolution) + 1)] Number of output channels in the encoder’s Resnet blocks
kernel_sizes_bottom_up list[int] >0, length == log2(resolution) + 1 True [3, …, 3, 2, 1] At each resolution (decreasing order), the side lengths of the encoder’s kernels
kernel_sizes_top_down list[int] >0, length == log2(resolution) + 1 True [3, …, 3, 2, 1] At each resolution (decreasing order), the side lengths of the decoder's kernels
channels_hidden list[int] >0, length == log2(resolution) + 1 True Set equal to 'channels' Number of intermediate channels in the encoder’s Resnet blocks
channels_top_down list[int] >0, length == log2(resolution) + 1 True Set equal to 'channels' Number of output channels in the decoder’s Resnet blocks
channels_hidden_top_down list[int] >0, length == log2(resolution) + 1 True Set equal to 'channels' Number of intermediate channels in the decoder’s Resnet blocks
warmup_iterations int > 0 False 50 Iterations to wait before skipping excessively large gradient updates
plot_recons_period int > 0 False 1 Frequency (in epochs) with which to plot reconstructions
subjects_to_plot int > 0 False 4 Number of subjects to include when plotting reconstructions
validation_period int > 0 False 1 Frequency (in epochs) with which to evaluate the model on the validation set
save_period int > 0 False 1 Frequency (in epochs) with which to save checkpoints
l2_reg_coeff float > 0 False 1e-4 Coefficient scaling L2 regularization term in objective
learning_rate float > 0 False 1e-3 Scalar controlling magnitude of stochastic gradient steps
train_frac float in [0,1] False 0.95 Fraction of data to use for training with remainder used for validation
gradient_clipping_value float > 0 False 1e2 Upper limit for the gradient norm, used when clamping gradients before applying gradient updates
gradient_skipping_value float > 0 False 1e12 If the gradient norm exceeds this value, skip that iteration’s gradient update
sequence_type str {"flair", "dwi"} False "flair" HMMM. THIS IS DEPRECATED: FIRST STEP TO REMOVING IT WOULD BE TO ALWAYS SET IT TO ‘flair’... (We do not here vary the architecture based on the imaging modality…)
likelihood str {"Gaussian", ?} False "Gaussian" Choice of likelihood function. I’M NOT SURE IF ANYTHING OTHER THAN ‘Gaussian’ IS (FULLY) IMPLEMENTED HERE!
variance_hidden_clamp_bounds list[float] Positive False [0.001, 1] MISNOMER! Should be 'std_clamp_bounds_hidden'. Lower and upper bound on the std of the prior and posterior Gaussian distributions of the latent variables
variance_output_clamp_bounds list[float] Positive False [0.01, 1] MISNOMER! Should be 'std_clamp_bounds_output'. Lower and upper bound on the std of the Gaussian distribution of the input given the latent
latents_per_channel_weight_sharing str | list[bool] {"none", "all"} or length = log2(resolution) + 1 False "none" If not 'None', at each resolution (decreasing order) 'True' means use a shared set of weights at that resolution to predict the latents at that resolution.
latents_to_use str | list[bool] {"none", "all"} or length = log2(resolution) + 1 False "all" If not ‘None’, at each resolution (decreasing order) ‘False’ means replace the latents at that resolution k with the output of deterministic Resnet blocks.
latents_to_optimise str | list[bool] {"none", "all"} or length = log2(resolution) + 1 False "all" If not ‘None’, at each resolution (decreasing order) ‘False’ means withhold from the optimiser the parameters for the layers that predict the latents at that resolution
half_precision bool False False Whether to train model using 16-bit floating point precision
print_model bool False False Whether to display text representation of model at start of training
use_tanh_output bool False True Activation function used when predicting the location of the Gaussian distribution of the input given its latent. Alternative is torch.sigmoid()
new_model bool False True SHOULD ALWAYS BE TRUE! It was originally to support backwards compatibility with an older architecture…
use_abs_not_square bool False False Use absolute difference rather than sum of squares when computing the log likelihood
plot_gradient_norms bool False True Plot the norms of the gradients after each epoch
apply_mask_in_input_space bool False False DEPRECATED! IN THE PAST THIS WOULD TURN COST FUNCTION MASKING ON/OFF, BUT IT NO LONGER DOES ANYTHING
include_mask_in_loader bool False False AGAIN, DEPRECATED (SAME MASK AS ABOVE)
resume_from_checkpoint bool False False Resume training from a checkpoint
restore_optimiser bool False True When resuming training, restore the state of the optimiser (set to False to reset the optimiser’s parameters and start training from epoch 1)
keep_every_checkpoint bool False True Save, and keep, a checkpoint every epoch rather than just keeping the latest one
predict_x_var bool False True Model the scale, not just the location, of the Gaussian distribution of the input given its latent
use_precision_reweighting bool False False Re-weight the locations and scales of the prior and posterior distributions of the latents according to the Ladder VAE article
verbose bool False True Print more output
bottleneck_resnet_encoder bool False True In the encoder, use a three layer Resnet block with a middle layer that has fewer channels than the output layer (the bottleneck). Alternatively, use a two-layer Resnet block whose layers have equal numbers of output channels
normalise_weight_by_depth bool False True Normalise each convolution block’s randomly initialised kernel parameters by the (square root of the) depth of that block.
zero_biases bool False True Set each convolution block’s bias to zero after initialising it
use_rezero bool False False Use skip connections where the ‘non-skip’ part of the layer is multiplied by a scalar initialised to zero, as per the Rezero article.
veto_batch_norm bool False True Do not use batch normalisation anywhere
veto_transformations bool False False Do not apply augmentations to the training data
use_nii_data bool False True DEPRECATED! Should always be true: the only data we consume comes in the form if nifits (.nii)..
nifti_standardise bool False True APPEARS TO BE DEPRECATED… (THIS WOULD HAVE CONTROLLED Z-SCORING OF INPUT)
shuffle_niftis bool False False Randomise the order of the list of niftis before splitting into train and test sets.
save_recons_to_mat bool False False DEPRECATED!
use_DDP bool False True DEPRECATED!
convolutional_downsampling bool False False Down-sample using stride-two convolutions, rather than x2 nearest neighbour downsampling
predict_x_var_with_sigmoid bool False True Predict the scale of the Gaussian distribution of the input given its latent using a (scaled) sigmoid, rather than predicting the natural logarithm of the scale then exponentiating
base_recons_on_train_loader bool False False When plotting reconstructions, reconstruct the training data rather than the validation data
only_use_one_conv_block_at_top bool False False Use a truncated sequence of layers to predict from the latents the location and scale of the Gaussian distribution of the input given its latent
separate_hidden_loc_scale_convs bool False False Do not just use one convolutional block, with a two-channel output, to predict the location and scale of the prior and posterior Gaussian distributions of the latents. Instead use separate blocks for the location and scale.
separate_output_loc_scale_convs bool False False As above, but for the distribution of the input given its latent
discard_abnormally_small_niftis bool False True DEPRECATED!
apply_augmentations_to_validation_set bool False False Apply to the validation set the same augmentations applied to the training set
visualise_training_pipeline_before_starting bool False True Plot examples of the augmented training points before training begins

Options for specifying model architecture

Bottom-up graph

  • channels
  • channels_hidden
  • kernel_sizes_bottom_up

Top-down graph

  • channels_top_down
  • channels_hidden_top_down
  • channels_per_latent
  • latents_per_channel
  • kernel_sizes_top_down
Clone this wiki locally