-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added LU decomposition with complete pivoting #160
Conversation
This resolves issue #151.
This is related to issue #151.
As discussed, changed "a point" to "an element" in the comments for the "get" family of functions. Also fixed the indentation of the new "get" and "get_mut" functions by changing tabs to spaces.
Added: * LUPQ, the result of unpacking a complete-pivot LU decomp. * FullPivLu, the parallel for PartialPivLu * A number of tests for full pivoting.
I just had a brief glance at the code, and from what I could see I'm very happy with it. Great job! I'll try to have a more thorough look over the next few days, but I'm due to hand in my thesis next week so please forgive me if I take long :) |
Great, thanks! I actually have a question about errors and panics. Right now, Good luck with your thesis! |
A quick glance also looks good to me! Thanks for turning this in so quickly! As for your question, I agree that this does seem like an inconsistency. My gut feeling is that we probably want to have both panic here as it is a non-recoverable user error. What do you think @Andlon ? |
Sounds reasonable. Should trying to invert a non-invertible square matrix also be a panic? |
Is
Sorry, haven't had the time to study the code in-depth, but I hope perhaps you can answer the question of rectangular LU decomposition! |
@Prismatica - No I'd say this should be an error. In this case the user might have a matrix which should be theoretically invertible but due to numeric instability is not. Here it would be preferable to give them an error informing them that the matrix could not be inverted. For an example use case - we often have matrices in machine learning which are theoretically invertible but when computed may be singular. I prefer to see an error here so I can control the message to the user - perhaps suggesting a technique to ensure the matrix is non-singular (normally adding
Good question. So far I'm happy restricting LUP to square matrices (unless @Prismatica has some insight here). |
@Prismatica: inverting a non-invertible matrix should return a result. The idea is the following:
|
OK, I will make sure that all the new functions panic on dimensionality mismatch, but not on singularity. As for rectangular matrices, I believe the current |
@Prismatica: I recall there being some notes on LU for rectangular matrices in Golub & Van Loan. Have you checked what it says there? I don't have the opportunity to read up on it right now. If we want to provide LU for rectangular matrices we need to know if it's i.e. numerically stable or if there are other issues (also, does it work equally well for tall and short matrices?). I've never used LU for anything but rectangular so it falls outside of my experience. If we can actually reasonably guarantee that our implementation(s) also works for rectangular matrices, I'm all for it! While it's great that you wrote some tests for this case, we just need some more evidence that it's supposed to work for all (?) rectangular matrices too, since unit tests only test a very, very small number of potential cases :) |
Also want to say that I think it's perfectly acceptable also to lock down to square matrices if it turns out that rectangular matrices are problematic. In this case we could also revisit the topic in the future if we wanted to. |
Here's what Golub and Van Loan have to say about LU on rectangular matrices (third edition): When discussing round-off errors, they say that "we mention that if A is m-by-n, then the theorem applies with n in (3.3.4) replaced by the smaller of n and m", where 3.3.4 is This result is only for the no-pivot case, and their error analysis for the complete pivot algorithm only deals with square matrices. I'd like to mention that Matlab and Eigen both support rectangular matrices in their LU (though I can't find what algorithm Matlab uses for its LU. It only has a single permutation matrix applied on the left, so I assume it's partial). All that said, I'd of course be happy to back off the rectangular matrix support if you all feel like it's unsafe or premature! |
From the resources you gave (thanks for those by the way!) it seems that the rectangular case is treated much the same as the square case and the same numerical stability guarantees are in place (more or less). Note that you said theorem 3.3.4 but gave theorem 3.3.1. However - I think that we should introduce rectangular support in both the partial and full cases simultaneously. This can either be as part of this PR or in a new one. I think the latter option may make it easier to verify the results. |
Alright, sounds good! So, to summarize, I will:
I can get to these things in the next few days. Also, I should have clarified, the (3.3.4) refers to the formula, not the theorem. It makes sense in context, but I was more-or-less arbitrarily copy-and-pasting :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I finally found some time to review this.
Overall, I think you've done an excellent job! The code is very clean, and the documentation and comments are spot on.
As far as I can see, the only thing that needs to change is to avoid cloning the matrix in decompose
. Otherwise, I just have a couple of questions.
I also wanted to say that I'm very excited about support for rectangular matrices! I think ultimately @AtheMathmo's suggestion of making it a separate PR was a good call, but I would very much welcome a PR for rectangular matrix support, if you still feel like tackling it!
Sorry for the delay. I've handed in my thesis now, so I should have some more time from now on :) (though perhaps not so much this weekend)
src/matrix/decomposition/lu.rs
Outdated
matrix.rows() == matrix.cols(), | ||
"Matrix must be square for LU decomposition."); | ||
|
||
let mut lu = matrix.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we take ownership of matrix
, we don't need to clone it and can repurpose its memory for storing the L and U factors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks! I'll fix that in the next few days.
src/matrix/decomposition/lu.rs
Outdated
let (piv_row, piv_col, piv_val) = FullPivLu::select_pivot(&lu, index); | ||
|
||
if piv_val.abs() < T::epsilon() { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It totally makes sense to do this, but I was just thinking that this might cause the U
matrix not to have a sorted diagonal. Is this something we should think about...? I'm not sure if this matters or not. Do you have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of T::epsilon as a threshold for zero, so, under that assumption, we'd have something like this:
[x x x x
0 x x x
0 0 0 0
0 0 0 0]
Does that make sense? I can change that to an exact check against T::zero()
.
src/matrix/decomposition/lu.rs
Outdated
let mut inv = Matrix::zeros(n, n); | ||
let mut e = Vector::zeros(n); | ||
|
||
if self.rank() != n { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps this is a little bit redundant, because solve
will panic in the case that the matrix cannot be inverted. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I was just thinking it was better documentation, but I think you're right, and avoiding an extra check is better. I'll fix that when I update the other things. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, solve
can fail on the n
th iteration, in which case it would be preferable to fail early, so perhaps a cheaper check beforehand like the one you had would be a good idea. In that case, we could perhaps do even better thank rank: we could implement something like is_invertible
which simply checks the value of the bottom right corner of the lu
matrix. But this doesn't have to be implemented for this PR :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! If we add the is_invertible
function, I assume we would want to allow the user to specify a tolerance, as in the parallel discussion about rank
. Would that mean that solve
would also take a tolerance, to pass to is_invertible
? That seems like it's getting a little complicated. I think we could probably just use the default you suggested: small_multiplier * self.lu[[0, 0]].abs() * T::epsilon()
, rather than giving the user the option of setting their own. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment :-)
1) Remove unneeded "clone" 2) Fix bad stopping condition 3) Remove unneeded singularity check that is handled in another portion of the code.
About the exact vs. epsilon-comparison: Does Golub & Van Loan have anything to say on this (I'm traveling atm)? I agree that a check against some epsilon value makes sense, but for robustness we would really need to make sure we use the right criterion. A check against Note that both partial pivoting and i.e. triangular solves also suffer from this issue at the moment (they use To sum it up, I think it's safer to just use an exact test for this case, because the pivoting will make sure that it actually works in a robust fashion (this is not the case for I will try to make an issue on the case of epsilon values in general throughout the library at a later time! Have to run now, but will see if I can find the time for a final review today. |
src/matrix/decomposition/lu.rs
Outdated
pub fn rank(&self) -> usize { | ||
self.lu.diag().fold( | ||
0 as usize, | ||
|x, &y| if y.abs() > T::epsilon() { x + 1 } else { x } ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the concerns previously discussed here. I think there are two options:
- Force the diagonal to be zero exactly in the gaussian elimination.
- Carefully tune the epsilon value we use here (maybe something like
T::from(3.0) * T::min_value()
is more appropriate?)
I think I agree with @Andlon that this isn't really a issue we should try to solve in this PR. The same problem presents itself in different forms throughout the library. I would really appreciate an issue tracking these so we can try to find a good solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this particular instance, the best measure I can think of would be to use something like:
// 2.0 is arbitrary
let small_multiplier = T::from(2.0).unwrap();
let eps = small_multiplier * self.lu[[0, 0]].abs() * T::epsilon();
This takes into account scaling of the matrix (I belive U(0, 0) is the largest element of the matrix, is it not?).
Basically, given any matrix A
, define a matrix B
such that B = A / max(A)
. Then the largest element in B
is 1, so using the machine epsilon (which roughly corresponds to representable accuracy between 1 and the next-greatest floating point number) is a reasonable choice in this case. Hence, one gets a tolerance of eps * max(A)
.
As far as I can see, this coincides with the behavior of Eigen's FullPivLu::setThreshold functionality as well (though they seem to not use any small multiplier).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I mean only for the rank computation, not for the actual decomposition. I suppose using exactly zero for the decomposition is still reasonable here just to make sure we get maximum accuracy.
We might also want to expose functionality for the user to choose his own tolerance for determining effective/numerical rank of the matrix. Perhaps something like rank_with_tol
or rank_given_tol
. I must admit I'm not a big fan of Eigen's way of maintaining state in the decomposition object, and I'd rather prefer stateless functionality to be honest. In any case, this does not need to be part of the PR, although if @Prismatica wants to, I wouldn't mind adding it onto this PR. What do you think @AtheMathmo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does seem like exactly the kind of function where a default parameter for the tolerance would be the ideal solution. In lieu of that I think that your proposals are good. I like the idea of scaling epsilon by the maximum - that seems like a good way to keep things simple but robust. We would need to be careful that this doesn't cause issues for large matrices though (where eps * max
might be fairly large).
I think we should let @Prismatica decide how much work they want to put into this PR. I'm perfectly happy for the additional epsilon work to be pushed back for future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow your logic - what do you mean by eps * max
being large for large matrices...? I assume by "large matrix" you mean that its dimensions are large? Sorry, I just can't quite make sense of the statement.
Also, yes, I think this can be pushed back to a future PR. Actually, it might be a good idea to do so to keep PRs smaller and more focused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - my language was not clear at all. I mean a matrix A
such that max(A) ~ 1/T::epsilon()
for example. In this case, we have that our eps ~ 1
- and so the zero check fails.
I'm not sure whether it is possible to produce such an A
as part of the LUP decomposition. But I wanted to raise the issue just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all sounds good. The one concern I have is that there would need to have many foo_with_tol
functions: rank_with_tol
, inverse_with_tol
, is_invertible_with_tol
, etc. Does anyone know how far out default parameter support is? Would it make sense to either defer allowing users to specify the tolerance or to force the users to do so, with the plan to change the tolerance to default parameters when that is available? If you all don't want to wait, I can either add the tolerance work in this PR or another one soon, depending on how you all want to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative idea is to define an enum:
enum Tolerance<T> { Internal, Specified{ tol: T } };
And then have rank
take an instance of the Tolerance
enum. If the user wants the function to calculate the tolerance, then they pass Internal
. When default parameter support finally shows up, we can streamline the API by making the tolerance argument into a default parameter.
This might be too heavy-weight for a simple problem though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You bring up some good points, @Prismatica. I agree that anything except proper support for default parameters in Rust is suboptimal. In lieu of that, and the expectation that perhaps 99+% of users would be happy with the default tolerance, my personal opinion is that we should just pick a sensible default tolerance for now, and not let the user configure it.
I've been thinking a good deal about how to select appropriate tolerances lately, and I think I've come to a conclusion from which a general procedure to determine an appropriate tolerance can be drawn. I will write this up properly in an issue when I have time. Currently I'm traveling, so I'm a little preoccupied most of the time.
However, I first just want to convince you that even in the example @AtheMathmo proposed, max(A) * * T::epsilon()
would still be appropriate. First, we know that it's equivalent to the problem with matrix B = A / max(A)
, in which case an epsilon value of T::epsilon()
is appropriate.
However, this is not so convincing when you think about the conundrum introduced by @AtheMathmo: What if max(A) ~ 1 / T::epsilon()
? In this case you'd have eps = 1
. This seems very coarse for a zero criterion! Perhaps counter to your intuition, there's actually nothing wrong about this - indeed, eps == 1
is a very appropriate choice in this case. To understand this, we must first understand why we want to distinguish "rank" from "numerical rank" in the first place.
Let us take a perfectly singular matrix C
. Let us now perform a few arithmetic operations on C
, giving us a singular matrix D
. That is, D
is singular in exact arithmetic. However, because of round-off errors, D
is in fact not singular anymore in floating-point arithmetic, and so it would have a higher rank than the exact rank. The idea of numerical rank is to try to account for these round-off errors and so approximate the exact rank. What this tells us, is that what we are interested in is not if any element is "close to zero". Instead, what we are interested in knowing is if any element could have been zero in exact arithmetic.
Going back to our original example with the matrix A
, consider two elements a ~ b ~ 1/T::epsilon()
. Let's assume that a
and b
are almost exactly the same. In fact, let's assume that a
and b
in exact arithmetic would have been exactly the same, but because of some numerical round-off they are close, but not exactly identical. In fact, let's say a
and b
are as close as possible to each other as floating point arithmetic admits. If we now compute a - b
, what do we get? Well, we'd get something on the order of 1
. In other words, something that in exact arithmetic would have been precisely 0
has become 1
in floating-point arithmetic! This essentially justifies using eps == 1
.
There's a little more to it, but I hope this provides at least a slightly motivating example. I'll have more to say on it later, but due to the nature of round-off in non-corrected summation, an even more appropriate epsilon value would be something closer to A.rows() * max(A) * T::epsilon()
! This is problem dependent, but it is due to the fact that the round-off errors induced by straightforward summation are proportional to the number of summands. However, I think max(A) * T::epsilon()
would do for now, and then we can discuss it more in detail in the future issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up! It is interesting to think about how the problem unfolds when you consider the practical implications in addition to the theoretical.
I am fairly convinced by your argument. I would certainly benefit from reading an issue if you do have time to write it up but until then I am happy to settle with this solution for the current PR. Thanks again!
Thanks for both of your work on this - the PR looks to be in a really good place! Once we wrap up the final parts of the discussion I'm happy to merge this :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the recent changes, I'm happy to merge this PR as is!
Instead of directly using T::epsilon to come up with a numerical rank, we now use lu[[0,0]].abs() * T::epsilon.
OK, I think this recent set of changes addresses most of the comments in this PR. The things left for a future PR are:
Anything I missed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still have a few minor things to comment on (corner cases, futureproofing), but my criticism is very granular at this point!
src/matrix/decomposition/lu.rs
Outdated
/// assert!(lu.is_singular()); | ||
/// # } | ||
/// ``` | ||
pub fn is_singular(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps the name is_singular
is a little unfortunate if we wish to move forward with our plans of extending LU to rectangular matrices. In general, one only talks about singular matrices if they are also square. Hence, it would be a little strange to refer to a rectangular matrix as 'singular'. More precisely, is a rectangular matrix of full row/col rank singular, or non-singular? But if it's non-singular it's still not invertible!
The benefit of flipping it around and instead using is_invertible
is that a matrix is invertible if and only if it is square and has no zeros on the diagonal of U. Thus it extends also nicely to rectangular matrices since they can never be invertible (and will always return false
).
Did this make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a relatively minor thing, but it is sufficient to check the last diagonal of U
:
// Assuming is_invertible(...)
if diag_size > 0 {
let diag_last = diag_size - 1;
self.lu[[diag_last, diag_last]] > self.epsilon()
} else {
// Are empty matrices invertible or not? Well, rank(A) = n, if A is nxn with n = 0, so I'd say it's invertible!
// This seems to be the most consistent choice. It would be good to have a comment in the code about
// this to make it clear that this is a deliberate choice.
true
}
src/matrix/decomposition/lu.rs
Outdated
} | ||
|
||
fn epsilon(&self) -> T { | ||
self.lu[[0, 0]].abs() * T::epsilon() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually need to take into account the corner case of empty matrices here (in which case accessing [0, 0] will panic). The easiest is probably to use i.e. diag
:
let max_element = self.lu.diag().next().unwrap_or(T::one());
max_element.abs() * T::epsilon()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the simplest is probably to use get
! I.e. self.lu.get(0, 0).unwrap_or(T::one());
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy with this PR now!
I think it represents an important and high-quality contribution to the library. Thank you for your patience and your work, @Prismatica! As you have seen, these kind of contributions are a process for learning for all of us, so it can take some time to actually agree on all the details. That said, I've been much less responsive during the course of this PR than I'd like to be, and I hope that I can be more available from now on.
@Prismatica: would you also be interested in taking responsibility for writing up an issue on the case of support for rectangular matrices in LU decomposition? It would perhaps be useful to briefly discuss it before making a new PR (although if you feel like it you're welcome to start one now already). Since you've been working on this you are probably the best qualified of us to write down your thoughts on this subject!
Sure, I'd be happy to write up an issue for rectangular matrix support. I can probably get started in a few days. Thanks for everyone's guidance on this PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered a detail about Rust's underflow behavior. Please see the inline comment!
src/matrix/decomposition/lu.rs
Outdated
let diag_size = cmp::min(self.lu.rows(), self.lu.cols()); | ||
let diag_last = diag_size - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered that Rust only wraps underflows if in release mode. This means that in debug mode, this would panic if diag_size == 0
- that is, for empty matrices. You may perhaps want to write a simple test with an empty matrix...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. I added a test and changed the implementation to avoid panicking on underflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor review comment - I think we can overlook this for now (to tackle it later) but let's wait for @Andlon to comment.
Another issue not wholly related to this PR are the leftover tests for the old LU decomp function. Previously @Andlon rewrote one of these tests for the PartialPivLU
changes (in the tests folder). In the future we will want to remove the old function and its tests but we should probably discuss whether we need to port any for complete pivoting (or even keep the current PartialPivLU
ports). On a similar vein - we may want to provide some consistency tests comparing PartialPivLU
with LUPQ
(checking that they solve
, inverse
and det
to the same values).
I think all of these can be addressed outside of this PR. I'm happy to merge the PR as it stands (once we resolve the det
comment).
And I want to echo @Andlon 's comments. Thank you for your hard work on this PR! It really is a fantastic contribution and I've enjoyed learning a lot through this process.
/// | ||
/// # Panics | ||
/// If the underlying matrix is non-square. | ||
pub fn det(&self) -> T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same comment here as I had for #150 . If we have an empty matrix I believe that this returns T::one()
- which is probably not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on #150. I believe det(A) == 1
if A
is an empty matrix is consistent across all usual definitions, so it actually makes sense!
Great. Looks good to me now! Though, to be super-nitpicky, I'm not a big fan of the introduction of |
Ok - I'll merge this now. Thanks again @Prismatica ! |
@AtheMathmo: I have some very extensive plans for improved testing across all decompositions once the API rework of the remaining decompositions are complete. This incorporates the QuickCheck-based test suite we've discussed before (testing for e.g. I can also say that I will most likely have a lot more free time for some extended period of time, starting next week, so I hope to get a lot of work done on the decompositions in the next few weeks. I've briefly started looking at QR already. Hopefully I'll have a PR ready next week. |
@Andlon: That's great news! I envy you - I'm still thoroughly crushed by research and course work! I can only really be hopeful that in the summer I will have some free time to really dig into Rust again. (Until then, I will still be as active as I can be. I'm just not in much of a position to start any exciting PRs myself :( ) |
@AtheMathmo: I can well understand! Luckily you've managed to recruit enough contributors that we should be able to hold the fort while you are indisposed :-) |
Let me know if I should add anything, or if I need to refactor things! I didn't add a number of functions that might be useful, like
is_invertible
,null_space
,chosen_pivots
,column_space
, etc.