You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In context of machine learning, many of the optimization algorithms rightfully preclude the presence of NaN values.
The documentation of the function may sometime mention, or not mention if a function can return NaN, and also, how it process NaN as input.
Alas, this is not systematically described, and also, people will just try functions left and right, when they are doing exploratory feature engineering.
The first focus would be to make sure the library offers some batteries included for those that don't want to find out "too late" in the pipeline (as they are long to setup, adjust, run, troubleshoot, etc.).
Without going too far in terms of how to make things perfect, and most sophisticated for long term maintenance, in all places, there is a plan that could bring some safety and long term maintainability:
Offering an FSharp.Stats.NumericallySafe module (people open it after FSharp.Stats and it shadows the unchecked variants), there could also be a module with assertions that would defensively throw
the module would call the existing APIs but wrap the values in a type that enforce the inspection via pattern matching or helper functions, borrowing idioms from F# core around option or result
the existing API should have CLR attributes on the functions / methods signalling "emits NaN", "accepts NaN"
there would be property based tests, possibly guided with code coverage, that would validate against presence of those attributes
there would be a page in the documentation pages that list all the functions, with filters about those "emits NaN" and other attributes
One can dream :)
In the meantime:
I wanted to point out that meanGeometric can emit NaN but the documentation says nothing about this, and it is not exposed under FSharp.Stats.NumericallySafe.
In the documentation pages, we'd want to display warning sections after describing the formula, logic, sample code, with a styling that will catch the attention.
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
I think we can ensure consistency based on presence of this, which seems to be in place (but it is not really discoverable in code, nor in the documentation pages.
We can also define F# analyzer that looks for functions like sqrt, that are bound to produce NaN.
If someone who groks maths (not me) could list here the F# and BCL functions that produce NaN that are used in this library, it would help with the implementation of such analyzer.
One issue with open FSharp.Stats.NumericallySafe approach, is you can only switch in your code using #if precompiler directives, or otherwise, you need to pass references to functions, rebind them in your own module based on some context.
There are scenarios where I'd want this to be done without recompiling nor forcing to rebind each function of interest.
In context of machine learning, many of the optimization algorithms rightfully preclude the presence of
NaN
values.The documentation of the function may sometime mention, or not mention if a function can return
NaN
, and also, how it processNaN
as input.Alas, this is not systematically described, and also, people will just try functions left and right, when they are doing exploratory feature engineering.
The first focus would be to make sure the library offers some batteries included for those that don't want to find out "too late" in the pipeline (as they are long to setup, adjust, run, troubleshoot, etc.).
Without going too far in terms of how to make things perfect, and most sophisticated for long term maintenance, in all places, there is a plan that could bring some safety and long term maintainability:
option
orresult
One can dream :)
In the meantime:
meanGeometric
can emitNaN
but the documentation says nothing about this, and it is not exposed underFSharp.Stats.NumericallySafe
.related: #280
The text was updated successfully, but these errors were encountered: