Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way forward for co-registration module #435

Open
rhugonnet opened this issue Sep 6, 2023 · 4 comments
Open

Way forward for co-registration module #435

rhugonnet opened this issue Sep 6, 2023 · 4 comments
Labels
enhancement Feature improvement or request priority Needs to be fixed rapidly

Comments

@rhugonnet
Copy link
Member

rhugonnet commented Sep 6, 2023

We currently have 25 issues open on co-registration 🙀, and a lot of notes and plans for the way forward dispersed everywhere. I initially tried to organize this in a project: https://github.com/orgs/GlacioHack/projects/3/views/2, but I think that the level of detail of the issues is too inconsistent for this, so I'm writing this summary instead!

Architecture:
We already did a lot in #158 and #329 to re-organize the coding structure to be more consistent and robust, but some points are left:

Features:
We've been needing for a while to have consistent:

Most of them are not too far since we introduced a consistent structure for optimizers or binning in #158. It makes weights almost directly supported (but we'll need to raise proper warnings for methods that currently ignore, such as ICP), and will make plotting more easy by consistently treating the type of methods (binning, fit, or both) and the dimensionality of the variables (1D, 2D, ND), which can be re-used for Tilt or NuthKaab fits in the affine functions.

Tests:
Some tests are still a bit old or slow, several related issues could be solved all at once:

Performance:
⚠️ We really need to think ahead for a structure that will allow memory-efficient computations using Dask:

  • For reading/writing with georeferencing, this will depend directly on rioxarray, and Add Xarray accessor gu geoutils#383 and Add Xarray accessor dem #392, so not too much to think about!
  • For Coreg.fit, as a regression requires all samples at once, it cannot be combined from solutions of different chunks (except in a blockwise way using BlockwiseCoreg). So it's all about the subsample which is fairly easy to deal with (read subsample per chunk + return concatenated vector). There's also the computation of derivatives needed, which are also straightforward (slope using overlapping chunks, bias_vars using normal chunks), see the thoughts in Argument subsample does not work for a CoregPipeline #428 (comment). Most other methods residuals, plot can be based on the same logic as they use subsample.
  • It leaves the logic for Coreg.apply, which might not be trivial. For bias corrections, the solution from the optimized function is applicable independently to every pixel given their bias_vars (coordinates, angle, etc), so very easy to apply to chunk. However, for affine methods, applying a 3D affine matrix in 4x4 format lazily to independent chunks won't work directly... it would also require a definition of the rotation center of the matrix, and maybe other things... Any thoughts on how to address this @erikmannerfelt? Or maybe @friedrichknuth has some insights?
  • We could also improve affine pipelines performance with Improve CoregPipeline.apply() and .fit() performance by merging matrices #79,
  • And improve memory performance by adding a default resampling logic, see Provide default downsampling logic in Coreg pre-processing for performance #437.

Bugs:
Here there's a lot, but they might solve themselves (or become irrelevant) after changing architecture + tests:
#423, #422, #404, #326, #232, #193

Idea of plan moving forward:

  1. Think ahead on logic for Performance: already done a bit in Argument subsample does not work for a CoregPipeline #428 (comment). I think we can use the structure proposed there which should work eventually 🤞! For apply, this should be adaptable down the line...
  2. Focus first on things all related to Architecture: to avoid the effort of adapting all the rest if we did it first (new features, bug fixes and tests). A lot of lessons learned from Add BiasCorr classes and rename previous coreg.BiasCorr in coreg.VerticalShift #158 and Re-organize coreg.py #329.
  3. Add basic architectural Tests to ensure 2. is working as intended with what we already have, and add new data for consistent (and quick!) tests (this specific point might require its own thread of discussion).
  4. Add support for all new Features: should be made easier by the consistent architecture!
  5. Add new Tests in parallel of each feature in 4.
  6. See if the Bugs are still relevant, and fix them if they are! 😄
  7. Celebrate, for we would have reached quite a way! 🍾

Any thoughts @adehecq @erikmannerfelt? 😄

@rhugonnet
Copy link
Member Author

To clarify on the Coreg.apply for chunks:

  • The chunked writing of the transformed DEM would be fairly easy...
  • ... But we'd need an apply_matrix_coords function that can transform the 4x4 matrix into a Callable that takes in 2D coordinates as input, and outputs the affine transformation at any given pixel (like is currently done in BiasCorr classes with fit_func).

Is that something we can do currently @erikmannerfelt?

@adehecq
Copy link
Member

adehecq commented Sep 7, 2023

Any thoughts @adehecq @erikmannerfelt? 😄

Wow, I'm a bit overwhelmed with the list of things left to do to have any useful feedback for the moment.
It's great you made an exhaustive list of all the things to be considered for coreg. Now we need two things:

  • time ! There are a lot of changes to be made so we need to coordinate efforts on this. I'm happy to actively contribute to the code writing, but either way, someone will have to review the code so it's better if we do it in parallel rather than having to go through a huge PR
  • split this huge list into a sequence of steps that we can go through.

What was your idea regarding this project. Did you want to work on it in the coming days/weeks?
We should try to set some blocks of time to work together on this. We can coordinate over Slack or face to face.

@rhugonnet
Copy link
Member Author

Yes it is a bit overwhelming like this haha, but thankfully some of these points are fairly small! 🙂

We'll need more detailed lists, but I think those can live within each PR that will address a new step/feature listed above. The architecture work will have to be one big PR, as everything is inter-dependent on the structure.

With all the discussions and the work already done in #158 trying to make the structure generic and robust, I have a clear vision of the technical steps to go through to reach every point. And I don't want to wait too long and lose it! I'd be happy to write a more detailed plan, discuss it, and combine efforts if you both can block some days for it, but can otherwise push on my own in the frame of the coreg study! 😉

My idea was to start on this full time end-of-September (for the co-registration study to reach a good stage before AGU).

@rhugonnet
Copy link
Member Author

rhugonnet commented Sep 11, 2024

One year later, and we've addressed most of the points of this (long) issue and the core coregistration module is now close to being finalized! 🥳 🎈

Now that the structure, input support and modular arguments are fully consistent across all coregistration methods, we can more easily address the three remaining aspects (which are either about maintenance or adding new features):

  1. Improve tests further, which is mostly about the variety of test data, and the way it is pre-processed and passed to tests (see list in first post above), which joins [WIP] [POC] refact: coreg tests #563 started by @adebardo,
  2. Add plots, statistics which joins [POC] add metrics from demcompare geoutils#602 and [POC] Report  #569 by @adebardo, and other some small convenience routines such as Provide default downsampling logic in Coreg pre-processing for performance #437,
  3. Finalize scaling abilities with out-of-memory coregistration greatly moved forward in Dask for coregistration #525 by @ameliefroessl (which can be implicitly supported once Xarray/GeoPandas accessors are implemented!).

Another point that comes to mind would be to add CoregFilter classes to use more easily in the CoregPipeline, but right now the user can always customize his input inlier_mask separately to remedy that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature improvement or request priority Needs to be fixed rapidly
Projects
None yet
Development

No branches or pull requests

2 participants