You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
clad::array_ref was introduced to solve the problem of finding and utilizing the size of derivatives of array arguments at runtime. This information was required to correctly differentiate function calls in the reverse mode AD. For understanding the problem in more detail, please visit issue #242.
Briefly, the solution to the problem described in issue #242 is: users have to specify the size of derivatives of array arguments when calling CladFunction::execute(...). Users specify the size of derivative array arguments by using the clad::array_ref wrapper, which consists of both the derivative array and the size information. Current requirements state that the array passed to clad::array_ref should be at least as large as the size specified. Typical usage of differentiating a function in the reverse mode and clad::array_ref is as follows:
doublefn(double *arr, int n) { /* ... */ }
double fn_grad = clad::gradient(fn, "arr");
double arr[5] = {1, 2, 3, 4, 5};
double d_arr[5] = {};
// array_ref wrapper for the derivative
clad::array_ref<double> ref(d_arr, /*size=*/5);
fn_grad.execute(arr, 5, ref);
// printing derivativesfor (auto i = 0U; i < 5; ++i)
std::cout<<d_arr[i]<<"";
Comprehensive reverse-mode AD model -- pullback functions
A while ago, we updated Clad's reverse mode AD to support the differentiation of function calls containing reference arguments. In this update, we began to differentiate functions used in a function call expression using a more comprehensive reverse-mode model. We named derivative functions generated using this model as pullback functions and the differentiation mode as the pullback differentiation mode, the naming inspired by ChainRules's pullback functions. The pullback function approach allows to "continue" the reverse mode automatic derivation when required. This facilitates correctly computing derivatives when arguments are passed by reference or pointers. For more information about pullback functions, please visit: https://gist.github.com/parth-07/6f654dc1423866f024629ddb5b8b9506.
How the comprehensive reverse-mode AD model effects clad::array_ref usage.
We were earlier using the simplified reverse-mode model to differentiate all the functions. A derivative function, generated using a simplified reverse-mode model, expects all the derivative arguments to be initialised to zero. For array arguments, this in turn requires creating a zero-initialised derivatives of arrays of the size at least as large as the corresponding primal array. Thus, to compute the gradient of a function that contains array arguments using the simplified reverse-mode model, the size information of array arguments is required. The comprehensive reverse-mode model aka pullback functions obviated the need for the size of derivatives of array arguments for the differentiation of function calls. This is because arrays are always passed by reference/pointer in C++, and it is not required to create temporary variables to store gradients for arguments passed by reference/pointer. Therefore, users do not need to specify the size of derivative array arguments, and thus, we do not need the clad::array_ref wrapper for the derivatives.
An illustrative example that shows the differentiation of a function call before and after introducing the pullback mode is as follows.
primal code:
double res = FnCall(arr, n);
reverse sweep adjoint code when 'FnCall' is differentiated using simplified reverse-mode model
// _d_arr is of 'clad::array_ref' type
clad::array_ref<double> _grad0(_d_arr.size());
double _grad1 = 0;
FnCall_grad(arr, n, _grad0, &_grad1);
_d_arr += _grad0;
_d_n += _grad1;
reverse sweep adjoint code when 'FnCall' is differentiated using the pullback mode
double _grad1 = 0;
// _d_arr can be a raw pointer; we do not need its size information anywhereFnCall_pullback(arr, n, _d_res, _d_arr, &_grad1);
_d_n += _grad1;
Do we still need clad::array_ref?
We just established that to automatically differentiate a function using the reverse-mode AD, we do not need to know the size of derivatives of arrays. But we do need to know the size of the primal array values. This is because in reverse mode AD: values are recorded in the forward sweep, and then used in the reverse sweep to generate derivative expressions. Recording values in the forward sweep is a crucial part of the reverse mode AD. In C++, to record an array, or in more basic terms, to copy an array, we need to know the size of the array. When arrays are passed as arguments to a function, they decay to raw pointers and size information is lost, therefore we need some mechanism to preserve the size information of array arguments. For more information about this issue, please visit #429.
We can use the clad::array_ref wrapper to pass primal array arguments size information to the derivative function in the same way as we are currently using clad::array_ref for derivative arrays.
Why should we change the current API when we can just add a rule such that the size specified for derivative arrays should be exactly the same as the corresponding primal arrays?
If the derivative function can assume that the derivative array and the primal array are of the same size, then the problem of copying arrays passed as arguments would be solved without changing the current API. But I would propose in the favour of changing the API design to use clad::array_ref for primal array arguments instead of derivative arrays. My reasons are:
It is not always intuitive that the primal array and derivative array are of the same size. Users may decide to use clad::array_ref of a larger size than the corresponding primal array.
Since the derivative function does not actually require the size of derivative array arguments anywhere, it is misleading to make users pass size information of derivatives of array arguments.
For API consistency. We also need the clad::array_ref wrapper for primal array arguments in the forward mode AD.
clad::array_ref usage in the forward mode AD.
Currently, we cannot use the forward mode AD for functions that contain array parameters. It is because, in the forward mode AD, we need to create derivatives of arguments in the derivative function and to create derivatives of array arguments we need the size of array arguments. Passing size information of primal array arguments to the derivative function will solve this issue.
This discussion includes breaking changes in both the forward and the reverse mode AD. But the changes themselves are very simple to implement in Clad.
Everyone, please share your thoughts and suggestions on this discussion. I am looking forward to discussing more on this topic and improving the usage and design as much as we can.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
clad::array_ref
origination and usageclad::array_ref
was introduced to solve the problem of finding and utilizing the size of derivatives of array arguments at runtime. This information was required to correctly differentiate function calls in the reverse mode AD. For understanding the problem in more detail, please visit issue #242.Briefly, the solution to the problem described in issue #242 is: users have to specify the size of derivatives of array arguments when calling
CladFunction::execute(...)
. Users specify the size of derivative array arguments by using theclad::array_ref
wrapper, which consists of both the derivative array and the size information. Current requirements state that the array passed toclad::array_ref
should be at least as large as the size specified. Typical usage of differentiating a function in the reverse mode andclad::array_ref
is as follows:Comprehensive reverse-mode AD model -- pullback functions
A while ago, we updated Clad's reverse mode AD to support the differentiation of function calls containing reference arguments. In this update, we began to differentiate functions used in a function call expression using a more comprehensive reverse-mode model. We named derivative functions generated using this model as pullback functions and the differentiation mode as the pullback differentiation mode, the naming inspired by ChainRules's pullback functions. The pullback function approach allows to "continue" the reverse mode automatic derivation when required. This facilitates correctly computing derivatives when arguments are passed by reference or pointers. For more information about pullback functions, please visit: https://gist.github.com/parth-07/6f654dc1423866f024629ddb5b8b9506.
How the comprehensive reverse-mode AD model effects
clad::array_ref
usage.We were earlier using the simplified reverse-mode model to differentiate all the functions. A derivative function, generated using a simplified reverse-mode model, expects all the derivative arguments to be initialised to zero. For array arguments, this in turn requires creating a zero-initialised derivatives of arrays of the size at least as large as the corresponding primal array. Thus, to compute the gradient of a function that contains array arguments using the simplified reverse-mode model, the size information of array arguments is required. The comprehensive reverse-mode model aka pullback functions obviated the need for the size of derivatives of array arguments for the differentiation of function calls. This is because arrays are always passed by reference/pointer in C++, and it is not required to create temporary variables to store gradients for arguments passed by reference/pointer. Therefore, users do not need to specify the size of derivative array arguments, and thus, we do not need the
clad::array_ref
wrapper for the derivatives.An illustrative example that shows the differentiation of a function call before and after introducing the pullback mode is as follows.
primal code:
double res = FnCall(arr, n);
reverse sweep adjoint code when 'FnCall' is differentiated using simplified reverse-mode model
reverse sweep adjoint code when 'FnCall' is differentiated using the pullback mode
Do we still need
clad::array_ref
?We just established that to automatically differentiate a function using the reverse-mode AD, we do not need to know the size of derivatives of arrays. But we do need to know the size of the primal array values. This is because in reverse mode AD: values are recorded in the forward sweep, and then used in the reverse sweep to generate derivative expressions. Recording values in the forward sweep is a crucial part of the reverse mode AD. In C++, to record an array, or in more basic terms, to copy an array, we need to know the size of the array. When arrays are passed as arguments to a function, they decay to raw pointers and size information is lost, therefore we need some mechanism to preserve the size information of array arguments. For more information about this issue, please visit #429.
We can use the
clad::array_ref
wrapper to pass primal array arguments size information to the derivative function in the same way as we are currently usingclad::array_ref
for derivative arrays.Why should we change the current API when we can just add a rule such that the size specified for derivative arrays should be exactly the same as the corresponding primal arrays?
If the derivative function can assume that the derivative array and the primal array are of the same size, then the problem of copying arrays passed as arguments would be solved without changing the current API. But I would propose in the favour of changing the API design to use
clad::array_ref
for primal array arguments instead of derivative arrays. My reasons are:clad::array_ref
of a larger size than the corresponding primal array.clad::array_ref
wrapper for primal array arguments in the forward mode AD.clad::array_ref
usage in the forward mode AD.Currently, we cannot use the forward mode AD for functions that contain array parameters. It is because, in the forward mode AD, we need to create derivatives of arguments in the derivative function and to create derivatives of array arguments we need the size of array arguments. Passing size information of primal array arguments to the derivative function will solve this issue.
This discussion includes breaking changes in both the forward and the reverse mode AD. But the changes themselves are very simple to implement in Clad.
Everyone, please share your thoughts and suggestions on this discussion. I am looking forward to discussing more on this topic and improving the usage and design as much as we can.
Beta Was this translation helpful? Give feedback.
All reactions