Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: recommended workflow for driving post processing with pqact from containers #94

Open
deeplycloudy opened this issue May 13, 2022 · 5 comments

Comments

@deeplycloudy
Copy link

The non-container LDM workflow allows easy post-processing with external scripts driven by pqact. For instance, the Unidata ldm-alchemy project is used to stitch together GOES imagery tiles, and simply receives the product stream over a PIPE. (I'm interested in doing this over the next week)

I'm opening this issue to request advice (or documentation) of how each of the pqact action categories should be used in the context of an isolated, containerized environment that (by philosophy) isolates access to other software installations and inter-process communication. I can think of a few ways:

  1. fork ldm-docker and heavily customize the Dockerfile to add additional post processing to same container
  2. expose a PIPE with ldm-docker that is read by another container. (If that's even possible!)
  3. only use ldm-docker to write files, and then have other processes watch that directory for further processing.
  4. somehow have LDM trigger a script in another container
@julienchastang
Copy link
Member

Hi Eric. Thank you for these suggestions. Indeed Tom and I discussed these ideas in the past when we were first getting this project started and I remember going through a similar thought process as points 1-4. Indeed, I believe it is possible to have inter container communication, for example, though I have not tried this personally. Actually, the TDM and TDS docker containers communicate but not specifically using Docker technology (the communication there is HTTP driven and occurs between different VMs). I am juggling a few tasks right now, but I will try to dig into this more deeply next week.

Pinging your former student and new Unidata employ Bobby Espinoza :-) (@robertej09) who is working on these containers with me.

@deeplycloudy
Copy link
Author

Thanks, @julienchastang and hello again, @robertej09!

I've seen some advice to use mkfifo to set up a named pipe between Docker containers, which sounds like it might provide the data flow path pretty elegantly. But it still requires being able to tell a script in the outside world to start listening at the right time.

I think, since Docker grew out of the HTTP world, it's pretty normal to communicate through services at an HTTP API endpoint, and so you can trigger things that way. But "just set up an HTTP API" is not quite as trivial for us occasional Unix/server dabblers! Easier with TDS where there's already a server, etc.

@ana-v-espinoza
Copy link
Contributor

Hey all,

Great to hear from you Eric @deeplycloudy ! We'll look into these various options. While modifying the Dockerfile seems like the easiest solution, getting the inter-container communication set up is more flexible in the long run and I can see a future where different kinds of post-processing scripts, jobs, etc. can all be containerized and "plug-and-play" through docker-compose.

We'll keep in touch to see where this goes. Send my "hello's" to folks over at Tech!

@julienchastang
Copy link
Member

Hi @deeplycloudy. @robertej09 and I were chatting this AM. In short, option 1 may be the path of least resistance, by either making a fork of the Dockerfile in question. Or inheriting from unidata/ldm-docker. (e.g.,. FROM unidata/ldm-docker). But 2-4 maybe better options in the long run because of more flexibility, as Bobby mentions, and improved separation of concerns. The different Docker containers can be coordinated through docker-compose. Here for example, there is some coordination going on via the file system (i.e., the /data directory). But that is somewhat of a crude coordination mechanism. Better would be to work with pipes between containers perhaps also coordinated via docker-compose. Unfortunately, I have never tried to work with pipes between containers and that would take some experimentation to find if this is a viable option. In sum, quick and dirty, inherit / modify. Longer term, inter docker coordination via docker-compose. (PS: Bobby and I were again just chatting and Bobby will start digging into the coordination of docker containers via pipes this PM. I think Bobby will have some follow-up questions).

@ana-v-espinoza
Copy link
Contributor

Hello again @deeplycloudy ,

To Long; Didn't Read:
We have: worked out inter-container data transfer through named pipes; identified several ways forward; a question for you so that we all have a shared vision.

I wanted to give you a quick update on our thoughts/progress concerning dockerized post-processing. As has already been mentioned, and as @julienchastang has summarized above, your Option 1 seems like the easiest/quickest way forward. Regardless, I've spent some time yesterday afternoon and this morning developing a minimal example demonstrating how a named pipe created using mkfifo could be used for inter-container data transfer. You can find it here

I also had a meeting with Ryan May to discuss how the data flowed from LDM to the ldm-alchemy python scripts you had mentioned, and it seems like the problem of the LDM "being able to tell a script in the outside world to start listening at the right time" is solved by having the ingesting script/job/container always be listening.

To expand a little bit on the repo I linked above, I believe there are a few ways we could proceed:

  1. Potentially use a FILE action to write directly to a named pipe shared between containers (after all, a named pipe is a file, like everything else in UNIX)
  2. Use a PIPE action to pipe incoming data into a script that:
    2a) Re-pipes the data into a named pipe shared between containers
    2b) Sends the data to the appropriate container through a socket

Of these options, 2) seems more flexible, as the script the data is being initially piped in to could perform some potentially useful logic. I believe 2b) would be the easiest to orchestrate with docker-compose, as docker-compose automatically creates so called "bridge networks" which should in principle make communicating over the network simple and require less initial setup for the end user. The only issue is that I don't think either of us two have much experience with that aspect of docker!

Nevertheless, since we have something functioning (in principle) at the moment, we will be attempting to set up a more practical proof of concept using an ldm-docker container in the near future. Could you provide us some more detail on the type of data you're expecting and what type of post-processing tasks you plan on performing? While that should not be particularly important right at this moment, it would be helpful for all of us to have the same vision so we don't get carried away and lost in the details!

Let us know if you have any questions or suggestions. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants