Higher-level system view walkthrough diagrams #9

metazool · 2024-07-09T11:14:06Z

There's a Miro board which formed a walkthrough picture of the end-to-end flow from instrument to inference, when this project was first proposed as one the RSE group could pick up as an onboarding exercise. Our limited Miro plan isn't great for collaboration or reuse, though. The experience of writing diagrams as dotfiles was good, either with sketchviz or a VSCode extension. It needs a pipeline to render them and publish embedded in markdown to Pages, useful elsewhere.

The wider aim here is to identify components in other projects that could be tested with this use case (such as on-prem Apache Beam or Airflow installations) and provide some natural prioritisation for the laundry list of possible next steps - c.f. the places to intervene that emerged from the Miro diagram, which is probably not exhaustive; the existing work only covers the last stages:

Workflow stages

Linking the sample acquisition date and time to the output data of the instrument (in this case the FlowCam)
Navigate security concerns about saving output straight onto a network drive rather manual transfer
Package up the input-to-analysis-ready processing in a way that could be run as a pipeline, e.g. by Airflow
Process to poll for new source data, process them, and upload to object storage without manual triggering
Binary classifier(s) to sift volumes of uninteresting data to save on excess cloud storage
Establish the best running consensus we can on managing credentials for server-side use with cloud storage
Same consensus but for client-side, handling SSO and whether we it makes sense to lean on Posit Connect to do that
Write-safe options for metadata, general preferences for vector and document stores that are easy to audit and (re)deploy
Standards oriented metadata catalogue interfaces
Proof of concept feature extraction from images and sight of its future applications

metazool · 2024-09-11T12:51:22Z

https://nerc-ceh.github.io/plankton_ml/diagrams/ - this is unfinished as it stands. The diagrams that exist were really useful for prompting conversation with support teams about future options for #20 and the pipelines got reused elsewhere.

I think this is a nice visual output, helps with project communication and to refer back to, would like to complete the drafts as a relative priority

metazool mentioned this issue Jul 18, 2024

Diagram view of first stage pipeline (from sampling instrument to shared storage) #11

Merged

metazool mentioned this issue Jul 31, 2024

Demo / lightning talk for plankton image data flow #8

Open

metazool mentioned this issue Aug 15, 2024

Add the "decollage" process for raw microscope output to the package #22

Merged

metazool added the documentation Improvements or additions to documentation label Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Higher-level system view walkthrough diagrams #9

Higher-level system view walkthrough diagrams #9

metazool commented Jul 9, 2024 •

edited

Loading

metazool commented Sep 11, 2024

Higher-level system view walkthrough diagrams #9

Higher-level system view walkthrough diagrams #9

Comments

metazool commented Jul 9, 2024 • edited Loading

Workflow stages

metazool commented Sep 11, 2024

metazool commented Jul 9, 2024 •

edited

Loading