Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

Provide support for runtime creation of Docker-managed volumes #270

Open
wphicks opened this issue Jan 31, 2018 · 14 comments
Open

Provide support for runtime creation of Docker-managed volumes #270

wphicks opened this issue Jan 31, 2018 · 14 comments
Assignees

Comments

@wphicks
Copy link

wphicks commented Jan 31, 2018

It should be possible to create a Docker-managed volume (not a bind-mount) and use it in the context of a girder_worker instance running inside Docker that launches Docker tasks.

@kotfic
Copy link
Contributor

kotfic commented Jan 31, 2018

This would resolve issues where the worker docker container needs to mount the host /"tmp" inside the container

@zachmullen
Copy link
Member

@kotfic that can be accomplished with bind mounts. @wphicks could you describe your use case in more detail so we know what sort of API we'll need to expose?

@cjh1
Copy link
Contributor

cjh1 commented Jan 31, 2018

@kotfic Not sure I understand the issues with the worker docker containers needing to mount the host "/tmp" ? The user doesn't have to do this? It just provides them with the ability to have a temporary directory shared between the host and container, this is a nice way to get results out of a container.

@kotfic
Copy link
Contributor

kotfic commented Jan 31, 2018

So this is applicable in situations where docker is running alongside docker (e.g girder worker running in a container and a task running in a container). Currently the TemporaryNamedVolume container tries to mount the worker container's tmp directory into the task container's tmp directory. Because the worker is talking to the docker engine via a mounted socket file, it is actually talking to the docker engine running on the host machine. That means when the worker container makes the request to mount /tmp/whatever/foo.jpg inside the task container, it is passed to the docker engine running on the host, and the docker engine mounts the host's /tmp/whatever/foo.jpg directory inside the task container. This requires that the host /tmp directory be mounted in the worker container's /tmp directory so that the host, the worker and the task all share the same /tmp directory.

mounting the host /tmp directory into the worker /tmp directory must be performed at deploy time (for instance here.

Alternately, if we used dockerpy to create a docker manged volume, that volume would exist on the host at /var/run/docker/.... and could be mounted at run time inside the worker container and the task container.

@zachmullen
Copy link
Member

Ah, understood, thanks for clarifying.

@zachmullen
Copy link
Member

Is this something we may need to break the current volume-related API for?

@cjh1
Copy link
Contributor

cjh1 commented Jan 31, 2018

@kotfic that makes sense now.

@kotfic
Copy link
Contributor

kotfic commented Jan 31, 2018

@zachmullen No, I think this can probably be addressed without breaking any API surface, I think/hope it is just a different set of transforms and maybe some small internal things that need to change (e.g. how we make sure Volume Transforms are identifeid in container_args and girder_result_hooks and added to dockerpy's volume dictionary)

@cjh1
Copy link
Contributor

cjh1 commented Jan 31, 2018

@kotfic Agreed, I think this can be encapsulated in a new transform or two.

@zachmullen
Copy link
Member

If it's a different set of transforms, doesn't that mean that either

  1. The client needs to know internal details about how the workers are deployed and choose to use the existing transforms or the new ones
  2. The client should always choose the new transforms, essentially deprecating the old ones

?

@cjh1
Copy link
Contributor

cjh1 commented Jan 31, 2018

@zachmullen The existing ones would still be used in the case of bind mounting an existing host volume, a different use case.

@zachmullen
Copy link
Member

zachmullen commented Jan 31, 2018

@cjh1 not sure I understand, to motivate this with an example I'm currently using:

    outdir = VolumePath('__thumbnails_output__')
    return docker_run.delay(
        'zachmullen/3d_thumbnails:latest', container_args=[
            '--phi-samples', str(_PHI_SAMPLES),
            '--theta-samples', str(_THETA_SAMPLES),
            '--width', str(_SIZE),
            '--height', str(_SIZE),
            '--preset', preset,
            GirderFileIdToVolume(files[0]['_id']),
            outdir
        ], girder_job_title='Interactive thumbnail generation: %s' % item['name'],
        girder_result_hooks=[
            GirderUploadVolumePathToItem(outdir, item['_id'], upload_kwargs={
                'reference': json.dumps({'interactive_thumbnail': True})
            })
        ]).job

As a client, I don't want to have to know whether the worker is running in the host or as a container, I just want an ephemeral directory for IO. In my case I don't declare any specific volume, so my understanding is that it defaults to the temporary volume. Would this still work in both deployment architectures once this is fixed?

@cjh1
Copy link
Contributor

cjh1 commented Jan 31, 2018

@zachmullen What I was trying to say is that the Volume transform would still be preserved for bind mounting, still a valid use case. In your use case ( where the default temporary volume is being using ) we could move to managed volumes, without breaking things.

@kotfic
Copy link
Contributor

kotfic commented Jan 31, 2018

Probably in the long run we should default to managed volumes as they are docker's recommended way of managing container external data. Managed volumes can work in either docker alongside docker or a regular process and shouldn't effect the API of your example.

We may need to provide some kind of flag or configuration where by girder worker can inform it's transforms that it is running inside a docker container and docker specific transforms can handle behavior differently. That would be some kind of a run-time configuration (i.e., when running girder-worker). At the end of the day though this is all speculative. There is no urgent need to implement or change anything, I just wanted to capture the somewhat complex behavior related to mounting from one container to another container through the host and identify that there may be better approaches to investigate.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants