All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning. This changelog does not include internal changes that do not affect the user.
- Fixed a bug introduced in v0.4.0 that could cause
backward
andmtl_backward
to fail with some tensor shapes.
-
Changed how the Jacobians are computed when calling
backward
ormtl_backward
withparallel_chunk_size=1
to not rely ontorch.autograd.vmap
in this case. Whenevervmap
does not support something (compiled functions, RNN on cuda, etc.), users should now be able to avoid usingvmap
by callingbackward
ormtl_backward
withparallel_chunk_size=1
. -
Changed the effect of the parameter
retain_graph
ofbackward
andmtl_backward
. When set toFalse
, it now frees the graph only after all gradients have been computed. In most cases, users should now leave the default valueretain_graph=False
, no matter what the value ofparallel_chunk_size
is. This will reduce the memory overhead.
- RNN training usage example in the documentation.
- Improved the performance of the graph traversal function called by
backward
andmtl_backward
to find the tensors with respect to which differentiation should be done. It now visits every node at most once.
- Added a default value to the
inputs
parameter ofbackward
. If not provided, theinputs
will default to all leaf tensors that were used to compute thetensors
parameter. This is in line with the behavior of torch.autograd.backward. - Added a default value to the
shared_params
and to thetasks_params
arguments ofmtl_backward
. If not provided, theshared_params
will default to all leaf tensors that were used to compute thefeatures
, and thetasks_params
will default to all leaf tensors that were used to compute each of thelosses
, excluding those used to compute thefeatures
. - Note in the documentation about the incompatibility of
backward
andmtl_backward
with tensors that retain grad.
- BREAKING: Changed the name of the parameter
A
toaggregator
inbackward
andmtl_backward
. - BREAKING: Changed the order of the parameters of
backward
andmtl_backward
to make it possible to have a default value forinputs
and forshared_params
andtasks_params
, respectively. Usages ofbackward
andmtl_backward
that rely on the order between arguments must be updated. - Switched to the PEP 735 dependency groups format in
pyproject.toml
(from a[tool.pdm.dev-dependencies]
to a[dependency-groups]
section). This should only affect development dependencies.
- BREAKING: Added a check in
mtl_backward
to ensure thattasks_params
andshared_params
have no overlap. Previously, the behavior in this scenario was quite arbitrary.
- PyTorch Lightning integration example.
- Explanation about Jacobian descent in the README.
- Made the dependency on ecos explicit in pyproject.toml
(before
cvxpy
1.16.0, it was installed automatically when installingcvxpy
).
- Removed upper cap on
numpy
version in the dependencies. This makestorchjd
compatible with the most recent numpy versions too.
- Prevented
IMTLG
from dividing by zero during its weight rescaling step. If the input matrix consists only of zeros, it will now return a vector of zeros instead of a vector ofnan
.
autojac
package containing the backward pass functions and their dependencies.mtl_backward
function to make a backward pass for multi-task learning.- Multi-task learning example.
- BREAKING: Moved the
backward
module to theautojac
package. Some imports may have to be adapted. - Improved documentation of
backward
.
- Fixed wrong tensor device with
IMTLG
in some rare cases. - BREAKING: Removed the possibility of populating the
.grad
field of a tensor that does not expect it when callingbackward
. If an inputt
provided to backward does not satisfyt.requires_grad and (t.is_leaf or t.retains_grad)
, an error is now raised. - BREAKING: When using
backward
, aggregations are now accumulated into the.grad
fields of the inputs rather than replacing those fields if they already existed. This is in line with the behavior oftorch.autograd.backward
.
- Basic project structure.
aggregation
package:Aggregator
base class to aggregate Jacobian matrices.AlignedMTL
from Independent Component Alignment for Multi-Task Learning.CAGrad
from Conflict-Averse Gradient Descent for Multi-task Learning.Constant
to aggregate with constant weights.DualProj
adapted from Gradient Episodic Memory for Continual Learning.GradDrop
from Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout.IMTLG
from Towards Impartial Multi-task Learning.Krum
from Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent.Mean
to average the rows of the matrix.MGDA
from Multiple-gradient descent algorithm (MGDA) for multiobjective optimization.NashMTL
from Multi-Task Learning as a Bargaining Game.PCGrad
from Gradient Surgery for Multi-Task Learning.Random
from Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning.Sum
to sum the rows of the matrix.TrimmedMean
from Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates.UPGrad
from Jacobian Descent for Multi-Objective Optimization.
backward
function to perform a step of Jacobian descent.- Documentation of the public API and of some usage examples.
- Tests:
- Unit tests.
- Documentation tests.
- Plotting utilities to verify qualitatively that aggregators work as expected.