Changing custom derivatives design #342

parth-07 · 2021-12-28T17:17:47Z

parth-07
Dec 28, 2021
Collaborator

This discussion introduces a rule-based design to differentiate a function using Automatic Differentiation (AD) in clad. It will supersede the current custom derivatives approach. This write-up aims to introduce the concept used and the associated API design, and also provide a common place for discussions and gathering of ideas to further improve this design. This rule-based design is inspired by ChainRules.jl library and Swift language.

The rule-based design will allow users to specify custom differentiation rules for any function and overloaded operator. The rule-based approach will have at least the following benefits:

It will facilitate differentiation of function whose definition is not available.
It will allow specifying a more numerically stable/efficient derivative of a function to be used for differentiation that otherwise would not be known using ordinary automatic differentiation transformation.

ChainRules.jl documentation does a very good job at explaining pushforward and pullback concepts. Please refer ChainRules.js documentation here and here for an introduction to pushforward/pullback concepts.

Proposed Solution

To add support for allowing user-defined differentiation rules, we introduce 2 new C++ attributes that will allow registering a function as pushforward or pullback of some other function.

`clad::pushforwardOf(FnName)` attribute

clad::pushforwardOf(FnName) attribute registers a function as the pushforward of the function specified by FnName.

For example:

double cube(double i) {
  return i*i*i;
}

// Registers `cube_pushforward` as the pushforward of `cube`.
[[clad::pushforwardOf(cube)]] 
double cube_pushforward(double i, double didx) {
  return 3 * i * i * didx;
}

Pushforward functions will be utilised by Clad wherever there is a need to obtain a derivative of the corresponding function.

A concrete example of pushforward function usage by Clad:

double fn(double i, double j) {
  double i_cube = cube(i);
  double j_cube = cube(j);
  return i_cube + j_cube;
}

Clad will utilise pushforward function in the synthesized forward mode derived function of fn as follows:

double fn_darg0(double i, double j) {
  double _d_i = 1;
  double _d_j = 0;
  double _d_i_cube = cube_pushforward(i, _d_i);
  double i_cube = cube(i);
  double _d_j_cube = cube_pushforward(j, _d_j);
  double j_cube = cube(j);
  return _d_i_cube + _d_j_cube;
}

`clad::pullbackOf(FnName)` attribute

clad::pullbackOf(FnName) attribute registers a function as the pullback of the function specified by FnName.

For example:

double cube(double i) {
  return i*i*i;
}

[[clad::pullbackOf(cube)]] 
void cube_pullback(double i, double dydx, double& d_i) {
  ...
  ...
}

Pullback functions can be designed in 2 distinct ways that have slightly different behaviour. We need to decide which design we should proceed with.

In the first way, the pullback function will provide pullback values, they will not modify the actual derived variables involved.

For example, consider this code snippet:

double cube(double i) {
  return i*i*i;
}

void cube_pullback(double i, double dydx, double& d_i) {
  d_i = 3*i*i*dydx;
}

Using this implementation of pullback function, it will be used as described below:

y = cube(i);

This statement will be transformed as follows:

// forward pass
_t0 = i;
y = cube(i);

// reverse pass
_t1 = _d_y;
cube_pullback(_t0, _d_y, _t_di);
_d_i += _t_di;
_d_y -= _t1;

In the second way, the pullback function will update the actual derived variables involved instead of just providing the pullback values.

For example, consider this code snippet:

void cube_pullback(double i, double& dydx, double& didx) {
  double _t0 = dydx;
  didx += 3*i*i*dydx;
  dydx -= _t0;

Using this implementation of pullback, it will be used as described below:

y = cube(i);

This statement will get transformed as follows:

// forward pass
_t0 = i;
y = cube(i);

// reverse pass
cube_pullback(_t0, _d_y, _d_i);

This design is computationally less expensive since fewer additional variables are involved.

pushforward and pullback functions defined by clad

Clad will internally automatically define pushforward and pullback to obtain derivatives of the functions if the user-defined pushforward/pullback function is not available.

The above discussed cube_pushforward and cube_pullback functions will be synthesised automatically by Clad if they are required and the user-defined rule is not available.

pushforward and pullback as the basic differentiation building blocks

We can go one step further and develop pushforward and pullback functions as the basic differentiation building blocks. One direct consequence of this will be that the forward and reverse mode derived functions will be defined directly using the corresponding pushforward and pullback functions.

For example,

double fn(double u, double v, double w) {
  …
  …
}

/**
 The forward mode derived function synthesised by Clad:
*/
double fn_darg0(double u, double v, double w) {
  return fn_pushforward(u, v, w, /*dudx=*/1, /*dvdx=*/0, /*dwdx=*/0);
}

/**
 The reverse mode derived function synthesised by Clad:
*/
void fn_grad(double u, double v, double w, clad::array_ref<double> d_u, clad::array_ref<double> d_v, clad::array_ref<double> d_w) {
  fn_pullback(u, v, w, /*dfdf=*/1, du, d_v, d_w);
}

Few major advantages of defining derived functions by directly forwarding differentiation to pushforward and pullback functions are as follows:

We will be able to utilise the @efremale idea #100 of not generating forward mode derived function for the same function multiple times for different parameters.

double fn_darg1(double u, double v, double w) {
  return fn_pushforward(u, v, w, /*dudx=*/0, /*dvdx=*/1, /*dwdx=*/0);
}

If fn is used inside some other function fnB, then clad will need to generate its pushforward function, but if fn is also directly differentiated (using auto d_fn = clad::differentiate(fn, "i"); for example), then currently the same function fn will be derived 2 times, one for pushforward and once for fn_darg0, this can be avoided if fn_darg0 is defined using the pushforward only.

pushforward/pullback design in regards with differentiating with respect to aggregate types

pushforward/pullback design goes very well with differentiating scalar types with respect to aggregate types. However, it will be challenging to use this design with differentiating aggregating types with respect to aggregate types, and thus it does not go well with differentiating aggregate types with respect to aggregate types.

Any suggestions or comments regarding this discussion are welcome. Please feel free to ask any questions.

parth-07 · 2022-01-03T14:37:07Z

parth-07
Jan 3, 2022
Collaborator Author

Discussion Update

This update is based on the discussion Vassil and I had a couple of days ago.

Attribute approach to specifying pushforward/pullback functions have various drawbacks that cannot be resolved trivially. The drawbacks are as follows:

Clang currently does not parses arguments of custom attributes defined by the plugin. Therefore, we would need to wait for the next clang version that adds this functionality to use this idea to its full potential.
Clang 10 and below does not support plugins registering custom attributes. Therefore, we would need to use some application of macros to make the API design compatible with all clang versions that we aim to support (clang 5+).
Passing function name in the form of string literal as an argument to attributes imposes the additional challenge of implementing a mini parser inside clad to parse the argument containing function name. This will be difficult to implement and can be a big source of unexpected bugs in the future. It is certainly better to obviate these problems using some different API design if possible.
It is difficult to specify pushforward/pullback of overloaded and template functions.

In comparison, the custom derivatives approach that we currently employ does not suffer from most of these drawbacks.

Thus, for now, we have decided to provisionally go with the custom derivatives approach.

An example describing custom derivatives approach in action:

template <typename T>
T cube(T x) {
  return x*x*x;
}

namespace clad {
  namespace custom_derivatives {
    template<typename T>
    T cube_pushforward(T x, T d_x) {
      return 3*x*x*d_x;
    }

    template<typename T>
    T cube_pullback(T x, clad::array_ref<T> d_y, clad::array_ref<T> d_x) {
      T t0 = *d_y;
      *d_x += *d_y*3*x*x
      *d_y -= t0;
    }
  }
}

One place where we cannot directly use the custom derivative approach is specifying the custom derivative of a member function. There are several strategies that we can employ to solve this problem such as using a fixed naming convention for pushforward/pullback functions (without the enclosing custom_derivative namespace) and using inheritance for extending the original types with pushforward/pullbacks. We need to decide what is the best way forward for this.

0 replies

efremale · 2022-01-03T20:43:49Z

efremale
Jan 3, 2022

Cool, I think pushforward/pullback is the way to go in general (independently on how custom derivatives are implemented)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing custom derivatives design #342

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Changing custom derivatives design #342

parth-07 Dec 28, 2021 Collaborator

Proposed Solution

clad::pushforwardOf(FnName) attribute

clad::pullbackOf(FnName) attribute

pushforward and pullback functions defined by clad

pushforward and pullback as the basic differentiation building blocks

pushforward/pullback design in regards with differentiating with respect to aggregate types

Replies: 2 comments

parth-07 Jan 3, 2022 Collaborator Author

Discussion Update

efremale Jan 3, 2022

parth-07
Dec 28, 2021
Collaborator

`clad::pushforwardOf(FnName)` attribute

`clad::pullbackOf(FnName)` attribute

parth-07
Jan 3, 2022
Collaborator Author

efremale
Jan 3, 2022