Consider scale-invariant convergence criterion for eigenproblems? #60

danielwe · 2022-08-23T17:42:56Z

KrylovKit.jl uses an absolute convergence criterion norm(residual) < tol. Some packages, like ArnoldiMethod.jl and Arpack.jl, use a slightly different criterion norm(residual) < tol * |λ| where λ is the eigenvalue associated with the residual vector (modulo some absolute tolerance for when |λ| ≈ 0). To me, this makes a lot of sense, as it makes convergence of an eigenpair invariant to the overall scale of A, inversion of A, and the size of A on the eigenspace in question. I suppose you could think of it as setting a relative tolerance for eigenvalues, or equivalently an absolute tolerance for eigenvector directions*. It's particularly relevant to my current application, where I'm solving for the smallest eigenvalue of Hermitian positive definite matrices and the numbers can get quite small (or large if I invert).

Is modifying the convergence criterion something you might consider for KrylovKit.jl?

*The intuition I'm describing here applies to symmetric/Hermitian problems where Schur vectors and eigenvectors are the same, so norm(residual) = ‖Ax - λx‖₂ with normalized x. Not sure exactly how the intuition translates to Schur vector residuals.

The text was updated successfully, but these errors were encountered:

danielwe · 2022-08-24T00:22:12Z

I'm not an expert in numerical linear algebra btw., so let me rephrase the question: Is there a reason you prefer the current convergence criterion over a scale-invariant one?

Jutho · 2022-08-29T23:06:09Z

Thanks for this question. I don't think there is a well established reason for preferring an absolute tolerance as I currently do. The only problem is indeed with eigenvalues close to zero, as you indicate. Having to special case this, and then anyway rely on an absolute tolerance for this special case, together with a relative tolerance for the 'normal' case, would make the code more complicated.

I admit that in many use cases where I use eigsolve, I have some information about the linear operator by which I know what the proper scale i, and which absolute tolerance I want to impose.

Another reason for the current implementation is the analogy with linsolve, where all the code internally also uses an absolute tolerance. In that case, it is easier to convert a relative tolerance to an absolute tolerance, as it amounts to multiplying by the norm of the right hand side.

danielwe · 2022-08-30T02:12:32Z

Thanks for responding! I understand the concern about code complexity. I'm not sure this needs to add that much complexity, though: Arpack.jl's criterion is resnorm < tol * max(eps^(2/3), |λ|), and ArnoldiMethod.jl's is resnorm < max(ε‖H‖, tol * |λ|) where ‖H‖ is the Frobenius norm of some intermediate matrix in the iterations (you probably understand this better than me). That is, both simply hardcode the abstol parameter at the point of evaluating the criterion, although they differ in how it works.

Then there's of course the option of making both reltol and abstol customizable and using resnorm < max(abstol, reltol * |λ|). That certainly adds complexity to both the code and the interface.

I haven't thought about similarities and differences to linsolve, so can't comment on that.

Adding a bit of rambling about what I'm doing and thinking that makes me care about this issue:

One thing I've been trying to do is to benchmark different packages and algorithms for finding the minimum eigenvalue of Hermitian positive definite operators of varying sparsity. However, it's a little tricky to make everything comparable when the convergence criteria are different. Right now my solution is to only test with matrices where the smallest eigenvalue is close to 1.

However, for my actual use case, I need reliable results as the spectral gap shrinks through orders of magnitude. When solving directly for the smallest eigenvalue I can always solve in two stages, first with tol=myreltol to get in the ballpark, and then with tol=myreltol * |λ| with λ from the previous solution, and initializing with the vector from the previous solution, to get a more accurate solution.

However, it usually seems preferable to invert, in which case the problem is that the tolerance is too strict rather than too lenient. That's harder to work around.

ArnoldiMethod.jl seems to work very well, so it's not at all critical for my work to get this change into KrylovKit.jl. The only drawback there is it doesn't implement the Lanczos algorithm for Hermitian matrices, but that's not really a problem.

Jutho · 2022-10-28T23:21:21Z

I am still not fully convinced. I prefer that the algorithm uses some absolute tolerance. If you want something that scales with the overall scaling of A, then the absolute tolerance that is specified can easily be chosen to depend on A. However, I don't see a good motivation for having a relative tolerance as proposed here, which depends on the eigenvalue. While indeed it means that it scales with an overall scaling in A, it also means that different eigenvalues of the same matrix A are converged with different tolerances? I don't see a good motivation for that.

Letting the tolerance only depend on some norm estimate for A (for example, via the partial matrix projected into the Krylov subspace) would resolve that issue, but it still disables one to use an absolute tolerance in the case where this is wanted.

The only solution that then remains is to have both an absolute and relative tolerance, and use e.g. the least strict of both (thereby assuming that if you don't want to use one of the two tolerances, you set it to zero exactly). That could be an option, but requires some more work in the implementation.

danielwe · 2022-10-29T01:37:12Z

However, I don't see a good motivation for having a relative tolerance as proposed here, which depends on the eigenvalue. While indeed it means that it scales with an overall scaling in A, it also means that different eigenvalues of the same matrix A are converged with different tolerances? I don't see a good motivation for that.

The motivation is precisely that it depends on each eigenvalue, meaning that each converged eigenvalue is correct to a specified number of significant digits. This also means that the accuracy of a converged eigenpair is invariant to whether you use forward or shift-and-invert iterations. Finally, it means that each converged eigenvector is correct to some angular tolerance, regardless of whether the associated eigenvalue is large or small (with the current behavior, eigenvectors associated with smaller eigenvalues have a looser tolerance).

As I mentioned, I'm using inverse iterations to find the minimum eigenvalue of a Hermitian positive definite operator. If this is close to zero, the corresponding eigenvalue of the inverse will be very large, so I'll need to set a fairly loose absolute tolerance in order to get convergence (at least within a reasonable number of iterations). However, I don't know a priori what the magnitude of this eigenvalue will be, and hence what the appropriate tolerance is, so if I were to use KrylovKit I'd have to employ some kind of two-stage scheme as sketched above (perhaps using forward iterations in the first stage).

I definitely think the ability to set both an absolute and relative tolerance would be the best solution, with a behavior like norm < max(abstol, reltol * |λ|). This seems to be a common pattern in numerical software.

I totally understand concerns about code complexity and dev effort, and unfortunately, I don't think I'd be able to help out with implementation right now. Just offering this up as a suggestion---it's something I need in my current project, but I've been quite happy using ArnoldiMethod.

Jutho · 2022-10-29T06:46:30Z

I don't understand why you use inverse iterations for the minimum eigenvalue of a Hermitian positive definite operator? It's not because it is close to zero, that you need the inverse. It is still an extremal eigenvalue, and you should be able to find it using just :SR (smallest real). (Shift-and)-Invert is used if you want an eigenvalue in the middle of the spectrum, which a Krylov method would otherwise not be able to find.

I also don't understand the argument that the relative tolerance would then be the same for normal and shift-and-invert iterations, I don't see how to relate || A v - lambda v || to ||(A - mu)^{-1} v - (lambda - mu)^{-1} v ||. Could you elaborate on that?

Krylov methods are a generalisation of the power method, and by this feature they converge best for extremal eigenvalues, in particular the largest magnitude. So there are many applications where what you want to find is the largest magnitude eigenvalues, and the largest will converge the most quickly, and so forth. However, with a tolerance that scales with |lambda|, the largest would actually have the loosest tolerance, and then the second one a slightly stricter tolerance, which seems contradictory.

danielwe · 2022-10-31T04:47:34Z

I don't understand why you use inverse iterations for the minimum eigenvalue of a Hermitian positive definite operator? It's not because it is close to zero, that you need the inverse. It is still an extremal eigenvalue, and you should be able to find it using just :SR (smallest real). (Shift-and)-Invert is used if you want an eigenvalue in the middle of the spectrum, which a Krylov method would otherwise not be able to find.

I profiled both ways: just passing the matrix with :SR, or computing the Cholesky factorization F and passing x -> F \ x with :LM. The latter is much faster for many of the matrix sizes and sparsities I care about. I made sure that the desired eigenvalue is approximately 1 in these benchmarks to avoid issues with what we're discussing here.

These two approaches illustrate the issue quite well: With the current convergence criterion, if the smallest eigenvalue is much smaller than 1, they obtain vastly different precision, the inverse method being much more accurate. Specifically, say the smallest eigenvalue is 1e-3 and I set tol=1e-6. With forward iterations, I'll get 3 converged digits, while with inverse iterations I'll get 9 converged digits. With the relative convergence criterion, I'd get 6 converged digits with both methods.

I also don't understand the argument that the relative tolerance would then be the same for normal and shift-and-invert iterations, I don't see how to relate || A v - lambda v || to ||(A - mu)^{-1} v - (lambda - mu)^{-1} v ||. Could you elaborate on that?

I suppose I was mainly referring to forward and inverse iterations, like in the example above (otherwise, you usually don't have a choice). But to take the shift-and-invert case: if you use the relative convergence criterion, tol becomes the relative tolerance on both (lambda - mu)^{-1} and (lambda - mu), i.e., both are correct to about a factor (1 + tol); however, with the absolute convergence criterion, tol is the absolute tolerance on (lambda - mu)^{-1}, which means that the tolerance on (lambda - mu) is actually tol * (lambda - mu)^2. That's not very intuitive---the larger the number, the fewer the significant digits; the smaller, the more. That's what I mean when I say that a relative tolerance is invariant to whether you are looking at inverses or not, while an absolute tolerance isn't. (I'd write out the math if I could use TeX; let me know if the derivation is unclear.)

However, with a tolerance that scales with |lambda|, the largest would actually have the loosest tolerance, and then the second one a slightly stricter tolerance, which seems contradictory.

I prefer to be able to say "I want the two largest eigenvalues, both of them with 9 significant figures." Absolute tolerances are sometimes needed because you need to define the concept of "approximately zero", but for all other purposes, relative precision is more meaningful. Float64 spans 632 orders of magnitude(!), there's no a priori way for me to choose a good absolute tolerance for a maximum/minimum eigenvalue.

I hope that helped explain my reasoning a little better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider scale-invariant convergence criterion for eigenproblems? #60

Consider scale-invariant convergence criterion for eigenproblems? #60

danielwe commented Aug 23, 2022 •

edited

Loading

danielwe commented Aug 24, 2022 •

edited

Loading

Jutho commented Aug 29, 2022

danielwe commented Aug 30, 2022

Jutho commented Oct 28, 2022

danielwe commented Oct 29, 2022

Jutho commented Oct 29, 2022

danielwe commented Oct 31, 2022

Consider scale-invariant convergence criterion for eigenproblems? #60

Consider scale-invariant convergence criterion for eigenproblems? #60

Comments

danielwe commented Aug 23, 2022 • edited Loading

danielwe commented Aug 24, 2022 • edited Loading

Jutho commented Aug 29, 2022

danielwe commented Aug 30, 2022

Jutho commented Oct 28, 2022

danielwe commented Oct 29, 2022

Jutho commented Oct 29, 2022

danielwe commented Oct 31, 2022

danielwe commented Aug 23, 2022 •

edited

Loading

danielwe commented Aug 24, 2022 •

edited

Loading