Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make docker container #340

Open
hmacdope opened this issue Apr 11, 2024 · 6 comments
Open

Make docker container #340

hmacdope opened this issue Apr 11, 2024 · 6 comments
Assignees

Comments

@hmacdope
Copy link

It would be great if you could deploy bespokefit with a docker container.

@jthorton jthorton self-assigned this Apr 11, 2024
@hmacdope
Copy link
Author

We can also publish docker packages with ghcr.io

@j-wags
Copy link
Member

j-wags commented Apr 15, 2024

What's the motivation for deploying on docker? We have to provide documentation and user support for anything we release so I want to make sure this is something that provides our partners value and that we have the time and expertise to maintain. This seems simpler than an entirely new deployment pathway (since it's still just conda under the hood), but every deployment pathway that we have to maintain is an added cost.

We have a dockerhub account that I could give you access to for hosting the images if we go ahead with this. Would the docker image have psi4 and/or xtb (and/or other engines)? And what would the automated testing for this look like?

@hmacdope
Copy link
Author

hmacdope commented Apr 15, 2024

Fair points @j-wags, I should have explained motivation a bit better.

A Dockerfile based container is a very easy way to spool up a service on a host, with all the dependencies and configuration already taken care of, alongside logs and easy status tracking.

This is most useful in a non HPC based environment, where the server is intended to not be ephemeral, but rather exist in an always on mode. This is very common outside of academia and on cloud instances. Docker is pretty much the standard for service based architectures to package their work for easy deployment. Running a python process in tmux is probably too fragile for a more heavyweight deployment.

There are two separate issues here that we should try not to conflate.

  1. Having a Dockerfile. This provides people the ability to start bespokefit-server as a docker service by building it themselves.

e.g start a working bespokefit-server instance in two commands (you could do similar for the executor service if you wanted.)

git clone [email protected]:openforcefield/openff-bespokefit.git
docker-compose up -d
# profit 

Possible value adds

  • Always-works portable deployment on any hardware
  • Opens up new types of deployments that need more horizontal scalability or error tolerance
  • Build isolation from the host, better for stability and security
  • Deployable in a microservices based model, where bespokefit has to fit into a chain of services that communicate with each other

A lot of these are probably more relevant to industry than in an academic lab, which is why I wanted to pass this on as an industry adjacent perspective.

  1. Deploying a Docker based container to dockerhub or ghcr.io, container registries that house pre-built containers. I prefer ghcr.io as already integrated with github, but dockerhub also very easy to integrate.

e.g

# Hey, what is an easy way to start using bespokefit?

docker run -it ghcr.io/openforcefield/bespokefit
# now in a bespokefit ready environment with the code all ready to roll

Possible value adds

  • Consistent builds you can point people to, no need to debug people's build environments, they can just use official one.
  • No need to even clone the repo, you can start working with bespokefit straight from a staged container.
  • Save time and energy on your deployment by using a pre-made setup.

I defs don't want to burden you with any additional maintenance or things you are not comfortable with, just let me know either way. Just thought might be an avenue to explore together. 😄. Re Happy to jump on a call as well.

We can always go ahead in a fork and/or modify the docker setup to pull in bespokefit from source ( in a standalone repo, rather than being checked into this repo). That is to say for our needs, there are other options available.

Would the docker image have psi4 and/or xtb (and/or other engines)?

My understanding is this is not needed for the server but if you wanted to do the same with the executor or package them both in the same container this is super easy.

What would automated testing look like?

If you wanted to do 2. (also the best way to test 1.) then you can build a container on commit main in CI. Given that the container only packages up the functionality already in the repo, if the tests pass, you can be pretty confident that the Dockerfile will work. You can run the testsuite inside the built container pretty easily if you want also.

@j-wags
Copy link
Member

j-wags commented Apr 16, 2024

Thanks for the great explanation, and apologies for something I forgot - Josh H is currently the "owner" for bespokefit so I shouldn't be stepping in here unless I strongly object to something.

What would automated testing look like?
If you wanted to do 2. (also the best way to test 1.) then you can build a container on commit main in CI... You can run the testsuite inside the built container pretty easily if you want also.

Ah, beyond just testing the stuff inside the container, I know very little about "having processes on different computers/containers talking to each other" - presumably there's some configuration of ports to expose and connect to, places to set addresses and tokens - and it's those settings/docs that I'd be most concerned with testing (also I anticipate a lot of the support requests would be about setting this up on different people's clusters).

I am a little nervous that putting this in our repo is a signal to folks that this is a deployment method that we support, and if external people start using it, it will be hard to take back later. Could we keep it in standalone repo or a branch/fork while we see how it performs, document it, and make any behavior changes?

@jthorton
Copy link
Contributor

Ah, beyond just testing the stuff inside the container, I know very little about "having processes on different computers/containers talking to each other" - presumably there's some configuration of ports to expose and connect to, places to set addresses and tokens - and it's those settings/docs that I'd be most concerned with testing (also I anticipate a lot of the support requests would be about setting this up on different people's clusters).

Just to clarify we are not exposing anything new here the configuration of ports and addresses is already part of bespokefit and users are free to change these settings to suite their needs and we advise them to do this when they use a distributed worker setup. The docker deployment should just make this a lot easier, we essentially want to replicate what alchemiscale does and offer a very easy way to deploy a server and remote workers on any infrastructure which is a massive selling point for industry users, from my very biased view 😄.

@j-wags
Copy link
Member

j-wags commented Apr 18, 2024

Gotcha. Proceed as you see fit - Just note that once the Docker stuff is pushed to main it is as good as released and in the public API :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants