-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computation of numerically stable matrix operations #182
Comments
I'll leave the discussion of statistical functions to @AtheMathmo, as this certainly is not my area of expertise, but I have some input on the accuracy of floating point summation. First, I think this is an interesting idea and if there is some need for this kind of feature (which I can imagine there is), I personally think that this would be a nice addition to the library, so it gets my vote. While summing floating point numbers in ascending order in terms of magnitude certainly improves the accuracy of the result, it's still prone to a relatively large error bound (basically proportional to the number of summands, |
That's absolutely true, Andlon! I would recommend using Kahan summation, since they have a constant memory usage and are easy to implement efficiently--it's already too bad to do the sorting step! I think that one big question to me is what the API should look like--should the row_sum() and col_sum() functions be parameterized over the algorithm used? I could imagine different applications wishing to choose a la carte the techniques that improve accuracy. |
Sorry for my delay in answering, I've been travelling. Here's some food for thought. Kahan summation is - as you point out - already an accurate way of computing the sum, while at the same time maintaining high performance and low memory overhead. However, we need to decide on the desired semantics. Kahan summation essentially provides (much) more accurate summation as opposed to naive summation, whereas summation to full precision gives the most accurate result representable in floating point arithmetic. I can imagine that if someone actually cares about the accuracy of summation, then they might want it to be as exact as possible? While parametrization over the summation algorithm sounds like a neat solution, the lack of default parameter types in Rust causes some unnecessary friction for the default use case. In likely the vast majority of use cases, the user is not overly interested in the accuracy and just wants to use naive summation. Having to write something like // Example naming - the point is that an extra `use` statement is necessary
// even for the typical use case of naive summation
use rulinalg::NaiveSummation;
let sums = matrix.sum_rows::<NaiveSummation>(); instead of let sums = matrix.sum_rows(); would quickly become rather annoying. Here I just picked summation instead of mean computation, but the situation is the same. I think the impact on usability is so great that we need to provide separate methods for the algorithms provided. In your initial post, you proposed something like Note that I think parametrizing over the algorithm would probably be the best solution if we had support for default generic types, so we may want to switch to that sometime in the future. |
Hello, I'm interested in computing the numerically stable mean & covariance on matrices. I noticed that the
unrolled_sum
function used at the core of the system is not pre-sorting the values from smallest to largest, which can cause small instabilities in the sum. Would you accept a contribution adding astable_mean
variant?Would you also accept a function that computes the
cov
of a matrix, or do you think that since it's such a simple calculation with so many twists (corr vs cov, demean vs plain) it's not worth it?The text was updated successfully, but these errors were encountered: