Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment of image processing, ideally somewhere maintained by others! #47

Open
metazool opened this issue Nov 11, 2024 · 3 comments
Open

Comments

@metazool
Copy link
Collaborator

Range of options on this

[ ] luigid running on a development VM in the on-prem cloud with direct read access to NAS
[ ] chromadb also running locally on the same machine
[ ] Object store API in Posit or Datalabs (how do apps then authenticate?)

[ ] luigid in a container on e.g. kubernetes in the on-prem cloud, but with a means of mounting data from the NAS
[ ] Data from the NAS going unprocessed to an object store, and the pipelines reading from there, obviating the need to connect applications to local storage
[ ] tasks running in e.g. Airflow rather than within Luigi

draw.io diagram

@metazool
Copy link
Collaborator Author

https://github.com/NERC-CEH/plankton_ml/blob/main/PIPELINES.md - this has the walkthrough of setting up the Luigi-based workflow (from NAS to object store).

@metazool
Copy link
Collaborator Author

metazool commented Nov 20, 2024

https://luigi.readthedocs.io/en/stable/central_scheduler.html - docs for the central scheduler. it's more of a task manager and UI, reminds me of Celery Flower, the expectation with Luigi is you use cron or similar to trigger tasks.

While testing this my connection to the VM died, luigid stayed running in the background but lost its memory of tasks run in my session. You can configure it to use a SQLAlchemy connection to preserve memory. I've got a Postgres available here. I'd still like to look at alternatives to chromadb ( #44 ) for vector types in a more typical SQL database

@metazool
Copy link
Collaborator Author

metazool commented Nov 20, 2024

I set up a luigi.cfg with sqlite backend and immediately ran into this - spotify/luigi#3227

(should be?) a small change and I might try to contribute it right now, but worried for our Luigi usage that it's been a known issue for over a year and version pinning to pre-2.0 is still the suggestion

edit ... now at Luigi's equivalent of this issue with tox4, and slightly regretting life choices https://github.com/python/mypy/pull/14578/files

edit ... seeing there's already an unmerged PR with the same set of changes I was considering, i'll try to leave a helpful comment there and then just pin sqlalchemy spotify/luigi#3267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant