Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the "decollage" process for raw microscope output to the package #22

Merged
merged 12 commits into from
Aug 27, 2024

Conversation

metazool
Copy link
Collaborator

See #21 for the context for this and links to the original - moving a rough script from the internal project and refactoring it for use in a future pipeline, as yet unspecified.

  • Adds test fixtures: a single sample image of the composite output as saved on the FlowCam instrument, and an extract of metadata corresponding to that
  • Adds minimal tests
  • Write EXIF headers in each individual image with geographic location and date, using the file naming conventions

To test

Run unit tests

export PYTHONPATH=.
py.test cyto_ml/tests/test_decollage.py

Run from the commandline (stopgap)

python cyto_ml/data/decollage.py fixtures/MicrobialMethane_MESO_Tank10_54.0143_-2.7770_04052023_1 test

The last argument there is an "experiment name" used to name the output files. This is a stop-gap set of changes, I didn't want to go any further as it's still not completely clear how the workflow fits together. #9

What this doesn't cover

One discovery here is there's a lot of metadata for individual images based on segmentation and shape analysis that happens onboard the FlowCam - a lot more detail than I thought we'd have access to.

Given we don't have a really clear use case for it, I haven't attempted to do anything with that here but I can see the output being usefully either dropped into the object store and picked up for use with dask via intake, or indexed in a lightweight database like sqlite/datasette

@metazool metazool requested a review from Kzra August 15, 2024 09:26
@metazool
Copy link
Collaborator Author

@Kzra your feedback particularly appreciated, if you're still generating new images on a regular basis then this should be directly useful to you now, and faster as we're not re-reading the large TIFF every time we extract a small window

If not, i guess the next step is to add a binary classifier and add a probably-junk flag and maybe a confidence metric to each output image, option of just not sending anything on to the cloud at this stage, and see if we can do that with the new object_store_api

@metazool metazool requested a review from albags August 27, 2024 10:33
This was referenced Aug 27, 2024
Copy link
Collaborator

@Kzra Kzra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work speeding up the decollage process

@Kzra Kzra merged commit 634c800 into main Aug 27, 2024
Copy link
Collaborator

@albags albags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work adding the decollage process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants