From a0afd1611d4bd312eeb7c393669525e0066fede7 Mon Sep 17 00:00:00 2001 From: Marten Lienen Date: Sat, 31 Aug 2024 20:27:41 +0200 Subject: [PATCH] Add a doc page about the different adjoints --- docs/backprop.md | 24 ++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 25 insertions(+) create mode 100644 docs/backprop.md diff --git a/docs/backprop.md b/docs/backprop.md new file mode 100644 index 0000000..c67d61b --- /dev/null +++ b/docs/backprop.md @@ -0,0 +1,24 @@ +# Gradients & Backpropagation + +There are two ways to compute gradients of the dynamics of an ODE, so the neural network +in the case of neural ODEs, with respect to the solution of the ODE. The first is to +backpropagate straight through the solver. After all, an ODE solver is just a series of +simple operations that define a dynamic computation graph that can be backpropagated +through with pytorch's autograd. This is implemented in `to.AutoDiffAdjoint`, so called +because it uses the autodiff/autograd mechanism. In general, this is the preferred method +as long as you have enough memory, because it is fast and gives accurate gradients. + +If you run out of memory, you can compute gradients by solving the so called adjoint +equations, which basically solve the ODE backwards and track gradients along the way. This +is implemented in `to.BacksolveAdjoint`. Solving the adjoint equations requires the +computation of gradients of the model at different steps in time, which +`to.BacksolveAdjoint` implements with `torch.func`. If your model is not compatible and +you get errors because of this, you can fall back to `to.JointBacksolveAdjoint`. This +computes the model gradients with pytorch's usual autograd and should always work but +comes with two caveats. However, to make this work, `to.JointBacksolveAdjoint` needs to +solve the `n` independent adjoint equations jointly as one joint system that is jointly +discretized. This breaks with torchode's approach of solving each ODE completely +independently, because the joint discretization introduces a subtle coupling between the +solutions of your batch of ODEs. Therefore, `to.JointBacksolveAdjoint` should be your +backpropagation choice of last resort. Furthermore, it is only applicable if all ODEs in +your batch have the same evaluation points. diff --git a/mkdocs.yml b/mkdocs.yml index 91bc887..37d9fc0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -12,6 +12,7 @@ plugins: nav: - "Introduction": README.md - step-size-controllers.md + - backprop.md - jit.ipynb - extra-args.md - extra-stats.ipynb