Make docker container #340

hmacdope · 2024-04-11T09:56:45Z

It would be great if you could deploy bespokefit with a docker container.

hmacdope · 2024-04-14T22:58:14Z

We can also publish docker packages with ghcr.io

j-wags · 2024-04-15T20:10:25Z

What's the motivation for deploying on docker? We have to provide documentation and user support for anything we release so I want to make sure this is something that provides our partners value and that we have the time and expertise to maintain. This seems simpler than an entirely new deployment pathway (since it's still just conda under the hood), but every deployment pathway that we have to maintain is an added cost.

We have a dockerhub account that I could give you access to for hosting the images if we go ahead with this. Would the docker image have psi4 and/or xtb (and/or other engines)? And what would the automated testing for this look like?

hmacdope · 2024-04-15T22:52:28Z

Fair points @j-wags, I should have explained motivation a bit better.

A Dockerfile based container is a very easy way to spool up a service on a host, with all the dependencies and configuration already taken care of, alongside logs and easy status tracking.

This is most useful in a non HPC based environment, where the server is intended to not be ephemeral, but rather exist in an always on mode. This is very common outside of academia and on cloud instances. Docker is pretty much the standard for service based architectures to package their work for easy deployment. Running a python process in tmux is probably too fragile for a more heavyweight deployment.

There are two separate issues here that we should try not to conflate.

Having a Dockerfile. This provides people the ability to start bespokefit-server as a docker service by building it themselves.

e.g start a working bespokefit-server instance in two commands (you could do similar for the executor service if you wanted.)

git clone [email protected]:openforcefield/openff-bespokefit.git
docker-compose up -d
# profit

Possible value adds

Always-works portable deployment on any hardware
Opens up new types of deployments that need more horizontal scalability or error tolerance
Build isolation from the host, better for stability and security
Deployable in a microservices based model, where bespokefit has to fit into a chain of services that communicate with each other

A lot of these are probably more relevant to industry than in an academic lab, which is why I wanted to pass this on as an industry adjacent perspective.

Deploying a Docker based container to dockerhub or ghcr.io, container registries that house pre-built containers. I prefer ghcr.io as already integrated with github, but dockerhub also very easy to integrate.

e.g

# Hey, what is an easy way to start using bespokefit?

docker run -it ghcr.io/openforcefield/bespokefit
# now in a bespokefit ready environment with the code all ready to roll

Possible value adds

Consistent builds you can point people to, no need to debug people's build environments, they can just use official one.
No need to even clone the repo, you can start working with bespokefit straight from a staged container.
Save time and energy on your deployment by using a pre-made setup.

I defs don't want to burden you with any additional maintenance or things you are not comfortable with, just let me know either way. Just thought might be an avenue to explore together. 😄. Re Happy to jump on a call as well.

We can always go ahead in a fork and/or modify the docker setup to pull in bespokefit from source ( in a standalone repo, rather than being checked into this repo). That is to say for our needs, there are other options available.

Would the docker image have psi4 and/or xtb (and/or other engines)?

My understanding is this is not needed for the server but if you wanted to do the same with the executor or package them both in the same container this is super easy.

What would automated testing look like?

If you wanted to do 2. (also the best way to test 1.) then you can build a container on commit main in CI. Given that the container only packages up the functionality already in the repo, if the tests pass, you can be pretty confident that the Dockerfile will work. You can run the testsuite inside the built container pretty easily if you want also.

j-wags · 2024-04-16T03:10:28Z

Thanks for the great explanation, and apologies for something I forgot - Josh H is currently the "owner" for bespokefit so I shouldn't be stepping in here unless I strongly object to something.

What would automated testing look like?
If you wanted to do 2. (also the best way to test 1.) then you can build a container on commit main in CI... You can run the testsuite inside the built container pretty easily if you want also.

Ah, beyond just testing the stuff inside the container, I know very little about "having processes on different computers/containers talking to each other" - presumably there's some configuration of ports to expose and connect to, places to set addresses and tokens - and it's those settings/docs that I'd be most concerned with testing (also I anticipate a lot of the support requests would be about setting this up on different people's clusters).

I am a little nervous that putting this in our repo is a signal to folks that this is a deployment method that we support, and if external people start using it, it will be hard to take back later. Could we keep it in standalone repo or a branch/fork while we see how it performs, document it, and make any behavior changes?

jthorton · 2024-04-17T11:21:12Z

Ah, beyond just testing the stuff inside the container, I know very little about "having processes on different computers/containers talking to each other" - presumably there's some configuration of ports to expose and connect to, places to set addresses and tokens - and it's those settings/docs that I'd be most concerned with testing (also I anticipate a lot of the support requests would be about setting this up on different people's clusters).

Just to clarify we are not exposing anything new here the configuration of ports and addresses is already part of bespokefit and users are free to change these settings to suite their needs and we advise them to do this when they use a distributed worker setup. The docker deployment should just make this a lot easier, we essentially want to replicate what alchemiscale does and offer a very easy way to deploy a server and remote workers on any infrastructure which is a massive selling point for industry users, from my very biased view 😄.

j-wags · 2024-04-18T17:34:45Z

Gotcha. Proceed as you see fit - Just note that once the Docker stuff is pushed to main it is as good as released and in the public API :-)

jthorton self-assigned this Apr 11, 2024

hmacdope mentioned this issue Apr 14, 2024

Add docker package build #342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make docker container #340

Make docker container #340

hmacdope commented Apr 11, 2024

hmacdope commented Apr 14, 2024

j-wags commented Apr 15, 2024

hmacdope commented Apr 15, 2024 •

edited

Loading

j-wags commented Apr 16, 2024

jthorton commented Apr 17, 2024

j-wags commented Apr 18, 2024

Make docker container #340

Make docker container #340

Comments

hmacdope commented Apr 11, 2024

hmacdope commented Apr 14, 2024

j-wags commented Apr 15, 2024

hmacdope commented Apr 15, 2024 • edited Loading

j-wags commented Apr 16, 2024

jthorton commented Apr 17, 2024

j-wags commented Apr 18, 2024

hmacdope commented Apr 15, 2024 •

edited

Loading