Use `clad::array_ref` for primal array arguments #471

parth-07 · 2022-07-16T21:12:59Z

parth-07
Jul 16, 2022
Collaborator

`clad::array_ref` origination and usage

clad::array_ref was introduced to solve the problem of finding and utilizing the size of derivatives of array arguments at runtime. This information was required to correctly differentiate function calls in the reverse mode AD. For understanding the problem in more detail, please visit issue #242.

Briefly, the solution to the problem described in issue #242 is: users have to specify the size of derivatives of array arguments when calling CladFunction::execute(...). Users specify the size of derivative array arguments by using the clad::array_ref wrapper, which consists of both the derivative array and the size information. Current requirements state that the array passed to clad::array_ref should be at least as large as the size specified. Typical usage of differentiating a function in the reverse mode and clad::array_ref is as follows:

double fn(double *arr, int n) { /* ... */ }

double fn_grad = clad::gradient(fn, "arr");
double arr[5] = {1, 2, 3, 4, 5};
double d_arr[5] = {};
// array_ref wrapper for the derivative
clad::array_ref<double> ref(d_arr, /*size=*/5);
fn_grad.execute(arr, 5, ref);

// printing derivatives
for (auto i = 0U; i < 5; ++i)
  std::cout<<d_arr[i]<<" ";

Comprehensive reverse-mode AD model -- pullback functions

A while ago, we updated Clad's reverse mode AD to support the differentiation of function calls containing reference arguments. In this update, we began to differentiate functions used in a function call expression using a more comprehensive reverse-mode model. We named derivative functions generated using this model as pullback functions and the differentiation mode as the pullback differentiation mode, the naming inspired by ChainRules's pullback functions. The pullback function approach allows to "continue" the reverse mode automatic derivation when required. This facilitates correctly computing derivatives when arguments are passed by reference or pointers. For more information about pullback functions, please visit: https://gist.github.com/parth-07/6f654dc1423866f024629ddb5b8b9506.

How the comprehensive reverse-mode AD model effects `clad::array_ref` usage.

We were earlier using the simplified reverse-mode model to differentiate all the functions. A derivative function, generated using a simplified reverse-mode model, expects all the derivative arguments to be initialised to zero. For array arguments, this in turn requires creating a zero-initialised derivatives of arrays of the size at least as large as the corresponding primal array. Thus, to compute the gradient of a function that contains array arguments using the simplified reverse-mode model, the size information of array arguments is required. The comprehensive reverse-mode model aka pullback functions obviated the need for the size of derivatives of array arguments for the differentiation of function calls. This is because arrays are always passed by reference/pointer in C++, and it is not required to create temporary variables to store gradients for arguments passed by reference/pointer. Therefore, users do not need to specify the size of derivative array arguments, and thus, we do not need the clad::array_ref wrapper for the derivatives.

An illustrative example that shows the differentiation of a function call before and after introducing the pullback mode is as follows.

primal code:

double res = FnCall(arr, n);

reverse sweep adjoint code when 'FnCall' is differentiated using simplified reverse-mode model

// _d_arr is of 'clad::array_ref' type
clad::array_ref<double> _grad0(_d_arr.size());
double _grad1 = 0;
FnCall_grad(arr, n, _grad0, &_grad1);
_d_arr += _grad0;
_d_n += _grad1;

reverse sweep adjoint code when 'FnCall' is differentiated using the pullback mode

double _grad1 = 0;
// _d_arr can be a raw pointer; we do not need its size information anywhere
FnCall_pullback(arr, n, _d_res, _d_arr, &_grad1);
_d_n += _grad1;

Do we still need `clad::array_ref`?

We just established that to automatically differentiate a function using the reverse-mode AD, we do not need to know the size of derivatives of arrays. But we do need to know the size of the primal array values. This is because in reverse mode AD: values are recorded in the forward sweep, and then used in the reverse sweep to generate derivative expressions. Recording values in the forward sweep is a crucial part of the reverse mode AD. In C++, to record an array, or in more basic terms, to copy an array, we need to know the size of the array. When arrays are passed as arguments to a function, they decay to raw pointers and size information is lost, therefore we need some mechanism to preserve the size information of array arguments. For more information about this issue, please visit #429.

We can use the clad::array_ref wrapper to pass primal array arguments size information to the derivative function in the same way as we are currently using clad::array_ref for derivative arrays.

Why should we change the current API when we can just add a rule such that the size specified for derivative arrays should be exactly the same as the corresponding primal arrays?

If the derivative function can assume that the derivative array and the primal array are of the same size, then the problem of copying arrays passed as arguments would be solved without changing the current API. But I would propose in the favour of changing the API design to use clad::array_ref for primal array arguments instead of derivative arrays. My reasons are:

It is not always intuitive that the primal array and derivative array are of the same size. Users may decide to use clad::array_ref of a larger size than the corresponding primal array.
Since the derivative function does not actually require the size of derivative array arguments anywhere, it is misleading to make users pass size information of derivatives of array arguments.
For API consistency. We also need the clad::array_ref wrapper for primal array arguments in the forward mode AD.

`clad::array_ref` usage in the forward mode AD.

Currently, we cannot use the forward mode AD for functions that contain array parameters. It is because, in the forward mode AD, we need to create derivatives of arguments in the derivative function and to create derivatives of array arguments we need the size of array arguments. Passing size information of primal array arguments to the derivative function will solve this issue.

This discussion includes breaking changes in both the forward and the reverse mode AD. But the changes themselves are very simple to implement in Clad.

Everyone, please share your thoughts and suggestions on this discussion. I am looking forward to discussing more on this topic and improving the usage and design as much as we can.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `clad::array_ref` for primal array arguments #471

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Use clad::array_ref for primal array arguments #471

parth-07 Jul 16, 2022 Collaborator

clad::array_ref origination and usage

Comprehensive reverse-mode AD model -- pullback functions

How the comprehensive reverse-mode AD model effects clad::array_ref usage.

Do we still need clad::array_ref?

Why should we change the current API when we can just add a rule such that the size specified for derivative arrays should be exactly the same as the corresponding primal arrays?

clad::array_ref usage in the forward mode AD.

Replies: 0 comments

Use `clad::array_ref` for primal array arguments #471

parth-07
Jul 16, 2022
Collaborator

`clad::array_ref` origination and usage

How the comprehensive reverse-mode AD model effects `clad::array_ref` usage.

Do we still need `clad::array_ref`?

`clad::array_ref` usage in the forward mode AD.