Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher-level system view walkthrough diagrams #9

Open
metazool opened this issue Jul 9, 2024 · 1 comment
Open

Higher-level system view walkthrough diagrams #9

metazool opened this issue Jul 9, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@metazool
Copy link
Collaborator

metazool commented Jul 9, 2024

There's a Miro board which formed a walkthrough picture of the end-to-end flow from instrument to inference, when this project was first proposed as one the RSE group could pick up as an onboarding exercise. Our limited Miro plan isn't great for collaboration or reuse, though. The experience of writing diagrams as dotfiles was good, either with sketchviz or a VSCode extension. It needs a pipeline to render them and publish embedded in markdown to Pages, useful elsewhere.

The wider aim here is to identify components in other projects that could be tested with this use case (such as on-prem Apache Beam or Airflow installations) and provide some natural prioritisation for the laundry list of possible next steps - c.f. the places to intervene that emerged from the Miro diagram, which is probably not exhaustive; the existing work only covers the last stages:

Workflow stages

  • Linking the sample acquisition date and time to the output data of the instrument (in this case the FlowCam)
  • Navigate security concerns about saving output straight onto a network drive rather manual transfer
  • Package up the input-to-analysis-ready processing in a way that could be run as a pipeline, e.g. by Airflow
  • Process to poll for new source data, process them, and upload to object storage without manual triggering
  • Binary classifier(s) to sift volumes of uninteresting data to save on excess cloud storage
  • Establish the best running consensus we can on managing credentials for server-side use with cloud storage
  • Same consensus but for client-side, handling SSO and whether we it makes sense to lean on Posit Connect to do that
  • Write-safe options for metadata, general preferences for vector and document stores that are easy to audit and (re)deploy
  • Standards oriented metadata catalogue interfaces
  • Proof of concept feature extraction from images and sight of its future applications

Plankton imagery flow

@metazool
Copy link
Collaborator Author

https://nerc-ceh.github.io/plankton_ml/diagrams/ - this is unfinished as it stands. The diagrams that exist were really useful for prompting conversation with support teams about future options for #20 and the pipelines got reused elsewhere.

I think this is a nice visual output, helps with project communication and to refer back to, would like to complete the drafts as a relative priority

@metazool metazool added the documentation Improvements or additions to documentation label Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant