-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vnc for openAI gym #31
Comments
Could you send us the floydhub job URL so I can try to reproduce the error on my side? |
Hi I'm actually running this locally. My goal is to be able to seemlessly transition from docker local to docker on floydhub or elsewhere. I have seen some people say that the trick is to install my nvidia drivers with option --no-opengl-files So again, all I'm doing is essentially Note, I build this entire machine from scratch precisely for working within Docker locally on OpenAi gym and pytorch, so I'm willing to do whatever up to including formatting the whole machine. I just need a reliable setup and floydhub/pytorch has everything except actually a way to run the environments. Again note that the environment do actually run with I just run docker floydhub/pytorch (vs nvidia-docker) but no GPU is found |
Could you create a minimal reproducible example on Floydhub? That can help reduce the time for us to trace down the root cause if we can reliably reproduce it on our side. |
@houqp yesterday I re-installed the ubuntu from scratch, updated packages, blacklisted nouveau, installed build-essentials, and installed the nvidia runfile driver (375.86), with --no-opengl-files and saying "no" to any offers to modify xconfig. Then I installed docker and nvidia-docker and ran floydhub/pytorch, and the nvidia drivers are loaded and no errors when running against xvfb. So the error part of this issue can be pinned on ubuntu ppa nvidia driver (not having option to install without messing up opengl). However, now that everything is working correctly, the title of this issue comes into play. Since openAI gym is a graphical environment, what is the recommended way to "see" the environments rendered when running floydhub/pytorch locally (ie with a Monitor). Does it have vnc installed? How to run it? How to connect from host OS? Obviously I can make my own image on top of this, but is there already a vnc setup in the image? What I suggest (but I defer to your experience) is to have a script similar to your run_jupyter.sh which would be run_vnc.sh and expose 5999 or whatever. It's not necessary to install any "desktop" or x server, xvfb will do fine. (at least for gym) so just a vncserver to optionally "see" what's going on. You know what would be really great is if someone figured out how to render live atari games within jupyter.../offtopic |
Note: here I posted a question describing the exact setup (just 2 commands on top of floydhub/pytorch) You can see that the rendering is wrong. Which underscores the need for floydhub to have that "already set up" in the image. I would truly appreciate if someone with experience in installing all the packages included this. I suggest a new image floydhub/pytorch-gui Then we could just use the image, connect over vnc and get right to work, as floydhub's goal. It's crazy that I've spent 12 hours just trying to get a visual of openAI gym (and still not working), and it's sad that each developer would need to do the same, before starting working on deep learning code. Not everything is web-based, or console-based, so vnc is needed |
I think it would be useful to add vnc streaming support to base image so all envrionmens will have access to it. I noticed you also installed fluxbox, is that a hard dependency? Sorry that we are working on releasing a major update to the platform, so I won't have enough time to work on this in the upcoming weeks. That said, you are welcome to submit a patch to the dockerfile at https://github.com/floydhub/dockerfiles/blob/master/dl/dl-base/dl-base-1.x.x.jinja. I can help test and build the new image for you. |
@houqp |
I have a feeling that there is still space for optimization, we can probably get away without a desktop environment. |
@houqp So I got sidetracked from this project doing the Coursera DeepLearning Specialization, but when that's done I really want to nail the stack for local, docker-based OpenAI gym development (especially atari). You're probably right that a full desktop is overkill. We just need working VNC of the gym environment(s) and any plot windows as well. I'm not familiar enough to know the best-practices way to do that. |
Sounds good, looking forward to work with you on that when you finish the class. This will benefit a lot of people :) |
So based on the info at openAI for universe, I did the following:
it seems to be important to map the docker socket in there for universe to start it's own docker containers. With this setup I can run the environments and they appear in vnc localhost:5900 or web localhost:15900 (they have a built in web-based vnc client). as an example, when running the above container, then I can write the below example code in a python script and run it, and view it from my machine at that web address.
However, this is all for the 'universe' version of the atari environments, not the native gym one. The difference is that the universe ones are wrapped in docker and the observation comes over vnc, so the frame rate is capped and there is network overhead and so training is slower, and the reward signal has lag. So I still would like to be able to do the same with the native gym atari environments, which are also already installed in the floydhub/pytorch image. But this is a good start. Pytorch + atari in one floydhub image and simple command line, without installing anything else. |
This is awesome, I am going to look into it today. |
EDIT: Sorry, I missed a detail in your previous comment. You were trying to get atari game working. Looks like all atari games are not working properly. |
On the other hand, i am able to save content of screen as image locally with correct content: env = gym.make('Pong-v0')
env.reset()
env.env.ale.saveScreenPNG('test_image.png') |
Yeah, I think the next step is to skip the gym abstraction and troubleshoot this with vanilla ALE. |
Any update? Running into the same issue. |
Unfortunately, I haven't had time to dig into it. @sagelywizard are you interested in helping? |
@houqp @sagelywizard I haven't had a chance to get back to this as I've been doing the 3rd and 4th courses in the Andrew Ng Deep Learning Specialization in coursera. However, I did actually try installing gym on the official pytorch image, and that worked with no graphics glitches. So I don't think the solution is in troubleshooting ALE directly, but more like divide and conquer removing other dependencies in the image. see this thread: But it confirms we shouldn't need any 'desktop' or anything special to get it working. It will be really nice to have that working on these floydhub images because they are so well organized and have the other needed tools already built in. |
@AwokeKnowing this is very useful info, thanks! We can start with https://hub.docker.com/r/floydhub/dl-base image and see if the glitches still happens at the base layer. |
@houqp So I finished the DeepLearning specialization and looked back at this issue. Now I have found the sad reality. It turns out that the openai universe team seems to not be working so closely with the open ai gym team (or maybe they abandoned universe and are all working on gym). It turns out that universe requires a very old version of gym. As in there have been a number of breaking changes since then. So if you update gym, then the graphics issue is completely fixed and everything works fine with x-forwarding the atari gym env. However, updating gym breaks universe. universe is not compatible with the updated gym. I'm not sure how you would like to handle that. It turns out there's more work on gym than on 'universe' mostly I guess because 'universe' is a much harder problem due to lag and it's more resource intensive. So one solution would be to install the new gym and leave universe out. then you could have a separate universe image (perhaps on top of the pytorch image which has about everything). or possibly you could figure out to do it in a virtualenv and perhaps have an -e GYM=universe environment flag which somehow selects whether you want gym to be the current gym or the 'very' old gym for universe. but overall, I believe that at this point having up-to-date gym is more inline with current trends. perhaps (or perhaps not) they will eventually refactor universe, but my expectation from what I have read is that it would probably be a significant refactor and is not coming soon. Edit: In fact, it seems universe is not going to be further developed: So I'm officially suggesting you just remove it and put the current gym. Something like:
Edit: so, for testing I basically concatenated all the pytorch parent dockerfiles into one long dockerfile. then I based it on the new cudagl image for opengl support. on that image, gym works fine by just installing it. It's unclear exactly why the floydhub pytorch image doesn't work to just install gym. all I can think of is that it's because of opengl. for reference, here's the Dockerfile I used. it basically runs through all the same stuff as the floydhub/pytorch and then adds gym at the end you can run it like this (on a local machine with nvidia):
So please try it at:
That should pop up like: |
Thanks @AwokeKnowing for digging into the root cause! I agree with you. If universe is not active anymore, we should exclude from the default package list. We are about to rebuild all images for a newer version of cuda. Will incorporate your change with the new release. |
@AwokeKnowing we released a new set of images (pythorch-0.3.1 and tensorflow-1.7) last week with universe removed. Do you mind give it a try? I will test this on a linux machine with GUI environment later today. |
openAI gym is working in the pytorch image, but to actually run the atari environments, it fails due to lack of x11. What's the best way to get it to where we can see the environments. I tried adding a desktop with vnc but it messed up the gpu support.
in my attempts strangely enough the atari environments would run when I run the container with docker, but when I run the container with nvidia-docker (gpu enabled), the gpu is definately working (torch.cuda.is_available()) but the atari games error out on
to be clear, I have an Nvidia GPU. my goal is to work with pytorch and openai gym. your pytorch image is perfect, except that I need to see the atari environments. I don't know how to see them but have been trying for 16 hours. I can follow instructions, but I can't seem to find instructions.
The text was updated successfully, but these errors were encountered: