-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider scale-invariant convergence criterion for eigenproblems? #60
Comments
I'm not an expert in numerical linear algebra btw., so let me rephrase the question: Is there a reason you prefer the current convergence criterion over a scale-invariant one? |
Thanks for this question. I don't think there is a well established reason for preferring an absolute tolerance as I currently do. The only problem is indeed with eigenvalues close to zero, as you indicate. Having to special case this, and then anyway rely on an absolute tolerance for this special case, together with a relative tolerance for the 'normal' case, would make the code more complicated. I admit that in many use cases where I use Another reason for the current implementation is the analogy with |
Thanks for responding! I understand the concern about code complexity. I'm not sure this needs to add that much complexity, though: Arpack.jl's criterion is Then there's of course the option of making both I haven't thought about similarities and differences to Adding a bit of rambling about what I'm doing and thinking that makes me care about this issue: One thing I've been trying to do is to benchmark different packages and algorithms for finding the minimum eigenvalue of Hermitian positive definite operators of varying sparsity. However, it's a little tricky to make everything comparable when the convergence criteria are different. Right now my solution is to only test with matrices where the smallest eigenvalue is close to 1. However, for my actual use case, I need reliable results as the spectral gap shrinks through orders of magnitude. When solving directly for the smallest eigenvalue I can always solve in two stages, first with However, it usually seems preferable to invert, in which case the problem is that the tolerance is too strict rather than too lenient. That's harder to work around. ArnoldiMethod.jl seems to work very well, so it's not at all critical for my work to get this change into KrylovKit.jl. The only drawback there is it doesn't implement the Lanczos algorithm for Hermitian matrices, but that's not really a problem. |
I am still not fully convinced. I prefer that the algorithm uses some absolute tolerance. If you want something that scales with the overall scaling of Letting the tolerance only depend on some norm estimate for The only solution that then remains is to have both an absolute and relative tolerance, and use e.g. the least strict of both (thereby assuming that if you don't want to use one of the two tolerances, you set it to zero exactly). That could be an option, but requires some more work in the implementation. |
The motivation is precisely that it depends on each eigenvalue, meaning that each converged eigenvalue is correct to a specified number of significant digits. This also means that the accuracy of a converged eigenpair is invariant to whether you use forward or shift-and-invert iterations. Finally, it means that each converged eigenvector is correct to some angular tolerance, regardless of whether the associated eigenvalue is large or small (with the current behavior, eigenvectors associated with smaller eigenvalues have a looser tolerance). As I mentioned, I'm using inverse iterations to find the minimum eigenvalue of a Hermitian positive definite operator. If this is close to zero, the corresponding eigenvalue of the inverse will be very large, so I'll need to set a fairly loose absolute tolerance in order to get convergence (at least within a reasonable number of iterations). However, I don't know a priori what the magnitude of this eigenvalue will be, and hence what the appropriate tolerance is, so if I were to use KrylovKit I'd have to employ some kind of two-stage scheme as sketched above (perhaps using forward iterations in the first stage). I definitely think the ability to set both an absolute and relative tolerance would be the best solution, with a behavior like I totally understand concerns about code complexity and dev effort, and unfortunately, I don't think I'd be able to help out with implementation right now. Just offering this up as a suggestion---it's something I need in my current project, but I've been quite happy using ArnoldiMethod. |
I don't understand why you use inverse iterations for the minimum eigenvalue of a Hermitian positive definite operator? It's not because it is close to zero, that you need the inverse. It is still an extremal eigenvalue, and you should be able to find it using just I also don't understand the argument that the relative tolerance would then be the same for normal and shift-and-invert iterations, I don't see how to relate || A v - lambda v || to ||(A - mu)^{-1} v - (lambda - mu)^{-1} v ||. Could you elaborate on that? Krylov methods are a generalisation of the power method, and by this feature they converge best for extremal eigenvalues, in particular the largest magnitude. So there are many applications where what you want to find is the largest magnitude eigenvalues, and the largest will converge the most quickly, and so forth. However, with a tolerance that scales with |lambda|, the largest would actually have the loosest tolerance, and then the second one a slightly stricter tolerance, which seems contradictory. |
I profiled both ways: just passing the matrix with These two approaches illustrate the issue quite well: With the current convergence criterion, if the smallest eigenvalue is much smaller than 1, they obtain vastly different precision, the inverse method being much more accurate. Specifically, say the smallest eigenvalue is
I suppose I was mainly referring to forward and inverse iterations, like in the example above (otherwise, you usually don't have a choice). But to take the shift-and-invert case: if you use the relative convergence criterion, tol becomes the relative tolerance on both (lambda - mu)^{-1} and (lambda - mu), i.e., both are correct to about a factor (1 + tol); however, with the absolute convergence criterion, tol is the absolute tolerance on (lambda - mu)^{-1}, which means that the tolerance on (lambda - mu) is actually tol * (lambda - mu)^2. That's not very intuitive---the larger the number, the fewer the significant digits; the smaller, the more. That's what I mean when I say that a relative tolerance is invariant to whether you are looking at inverses or not, while an absolute tolerance isn't. (I'd write out the math if I could use TeX; let me know if the derivation is unclear.)
I prefer to be able to say "I want the two largest eigenvalues, both of them with 9 significant figures." Absolute tolerances are sometimes needed because you need to define the concept of "approximately zero", but for all other purposes, relative precision is more meaningful. I hope that helped explain my reasoning a little better. |
KrylovKit.jl uses an absolute convergence criterion
norm(residual) < tol
. Some packages, like ArnoldiMethod.jl and Arpack.jl, use a slightly different criterionnorm(residual) < tol * |λ|
whereλ
is the eigenvalue associated with the residual vector (modulo some absolute tolerance for when|λ| ≈ 0
). To me, this makes a lot of sense, as it makes convergence of an eigenpair invariant to the overall scale ofA
, inversion ofA
, and the size ofA
on the eigenspace in question. I suppose you could think of it as setting a relative tolerance for eigenvalues, or equivalently an absolute tolerance for eigenvector directions*. It's particularly relevant to my current application, where I'm solving for the smallest eigenvalue of Hermitian positive definite matrices and the numbers can get quite small (or large if I invert).Is modifying the convergence criterion something you might consider for KrylovKit.jl?
*The intuition I'm describing here applies to symmetric/Hermitian problems where Schur vectors and eigenvectors are the same, so
norm(residual) = ‖Ax - λx‖₂
with normalizedx
. Not sure exactly how the intuition translates to Schur vector residuals.The text was updated successfully, but these errors were encountered: