Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate what is needed for R support to Codespaces #9

Closed
StevenMaude opened this issue Feb 27, 2024 · 20 comments
Closed

Investigate what is needed for R support to Codespaces #9

StevenMaude opened this issue Feb 27, 2024 · 20 comments
Assignees
Labels
users Getting feedback or ideas from users
Milestone

Comments

@StevenMaude
Copy link

StevenMaude commented Feb 27, 2024

The current configuration we have uses a Python image to get ehrQL and OpenSAFELY CLI running in Codespaces.

Adding R support might be a lower priority than other tasks, particularly if we focus on Codespaces initially for data extraction with OpenSAFELY, over analysis. But we know we have lots of R users on the platform, so it would be good to make it easier for them to write analysis code, without resorting to a local installation.

Tasks

  • Figure out what would be needed to make R code editing viable in Codespaces. (The OpenSAFELY R image should already work in Codespaces, but does not necessarily make it easy to get code completion for packages.)
  • Does the R action image add too much time or space to be practical to use? An alternative might be to run our own CRAN mirror restricted to the same list of packages (running a CRAN mirror is out of scope for this ticket, although describing how it's possible would be good).
  • Might we want to implement any R extras as a separate dev container configuration, or all the tooling in one?
  • Might we need to switch out the current Python image used in the dev container for another?
  • Could R users suffice with VSCode as an editor alone, without R Studio?
  • Review @remlapmot's existing work on this; it might be as a useful reference inspiration:
  • Speak to some existing R users about what they would need.

Timebox: 2-3 days.

Assumptions

  • We might assume that we are able to persuade R users to use VSCode over R Studio, if adding a viable R Studio setup to our configuration is tricky. That may or may not be a valid assumption.
@StevenMaude StevenMaude added this to the Discovery milestone Feb 27, 2024
@StevenMaude StevenMaude changed the title Investigate adding R support to Codespaces Investigate what is needed for R support to Codespaces Feb 27, 2024
@StevenMaude StevenMaude added the users Getting feedback or ideas from users label Feb 28, 2024
@lucyb
Copy link

lucyb commented Mar 19, 2024

I gave this a try and was able to get VSCode and RStudio working via Codespaces in about two clicks (slack thread). We could potentially use something like this to allow users to write R code without needing local setup.

It seems like RStudio Server is available under an AGPL license, so we should be able to make that available via a Docker image.

We might want to do a quick spike and test it with some researchers, potentially though.

@Jongmassey
Copy link

Interesting and hopefully useful observation:

The Opensafely R docker image makes use of renv (think virtualenv but for R). This means we can fetch the renv.lock file from the R docker image github repo and use it as our local environment. This file defines the base R version, the packages installed, and their versions.

It is possible to renv::restore() the whole environment, but this takes ages and involves lots of big downloads and compilation.

It is, however, possible to restore() individual packages, such that only the needed package(s) for a project are installed. This, however, is a divergence from the usual install.packages() way of installing a package that the average R user would expect. I wonder if there is a way of bridging this gap?

@Jongmassey
Copy link

We could also base our R (studio) development docker image on https://github.com/rocker-org/rocker-versioned2/tree/master a 4.0.x versioned image to match the R docker image used on job server

@lucyb
Copy link

lucyb commented Apr 5, 2024

Good spot about renv and the lock file, that's likely to be really useful. Do you think we could use this to install the dependencies into the docker image rather than at container startup?

Would we need to have two devcontainer configurations and have the researcher choose which one to use when starting the Codespace or is it possible/practical to do it in a single devcontainer configuration?

@lucyb
Copy link

lucyb commented Apr 5, 2024

We could also base our R (studio) development docker image on https://github.com/rocker-org/rocker-versioned2/tree/master a 4.0.x versioned image to match the R docker image used on job server

rocker have produced specific devcontainer images (here), so would one of their non-devcontainer images work? It might be worth a quick test, especially because their devcontainer specific images look like they might not support 4.0.x.

@Jongmassey
Copy link

Jongmassey commented Apr 5, 2024

I'll have a look next week at monkeypatching in R - we might be able to monkey patch install.packages(...) to renv::restore(...). Having a pre-restored image containing all the packages or having them all restore on startup are both distinctly unappealing due to container size and startup time.

@remlapmot
Copy link

Short comments - it's possible to make a rebuild of r-docker using of rocker/rstudio:4.0.5. In fact I've already done this (but using snapshot repos instead of renv). My image size is smaller than the old r image, so there's no need for monkey patching.

@remlapmot
Copy link

remlapmot commented Apr 7, 2024

Longer comments:

My Docker file is at https://github.com/opensafely/reverse-engineer-r-docker/blob/b6049c781a04333a1cfd27fde250c94de4c73f2a/legacy-04.Dockerfile

I had already started making this before Simon switched to renv - so it doesn't use renv but rather obtains the packages using snapshots from the Posit public package manager https://packagemanager.posit.co/client/#/ (like with Linux distros, there is a long history of rebuilding R environments using snapshots in the R community e.g. the MRAN servers ... but sadly they are now shut down; Posit public package manager only big one left).

Because I managed to find URLs for binary packages for almost all packages this build only takes 30 mins (or faster if you have faster internet and/or computer than me).

The image is on docker hub, you can try it with

docker run --platform linux/amd64 --rm -ti \
  -e PASSWORD=yourpassword -p 8787:8787 \
  -v "/$PWD:/home/rstudio" \
  remlapmot/r-docker:2024-04-02-rstudio

As obvs from that command, access the rstudio-server at localhost:8787, login details are

  • username: rstudio
  • password: yourpassword
  • after session finished close rstudio window
  • in shell session docker run issued, issue Control+C

I haven't tried rebuilding using renv on Linux - however I am a bit worried that it might fail because the R version is so old and lots of packages have been added at version numbers after R 4.0.5 was the current version of R.

I don't think you need to worry about image size - in fact my image is currently smaller than the r image Simon used to distribute before he switched to renv, i.e.,

  • your legacy image is 1.64 GB compressed
  • My remlapmot/r-docker:2024-04-02-rstudio image is 1.4 GB compressed

For Venexia to use I managed to use renv to restore the r image packages on a Windows 11 machine. That repo is https://github.com/remlapmot/renv-r-docker. There were 3 problem packages: BayesianTools, glmmTMB, and DHARMa. (DHARMa seems to need at least Rcpp 1.0.7 or later to build and Rcpp is at 1.0.5 in the renv.lock file) Which is why I needed a setup.R script before issuing the renv::restore()).

Personally I wouldn't look into monkey patching.

As pointed out by Dave several months ago when renv can use pak the renv::restore() will be much much faster. That is now possible (pak does parallel package downloads). To use that you'd need to upgrade to renv 1.0.5 and enable the setting as described here https://rstudio.github.io/renv/reference/config.html?q=pak#renv-config-pak-enabled. In the current image to upgrade renv run in R

renv::upgrade(prompt = FALSE)
renv::activate()

Then insert in the first line of .Rprofile

options(renv.config.pak.enabled=TRUE)

Then try the renv::restore().

I haven't tried that yet, but again it might currently fail due to that the r image having such a spread out (in terms of dates) set of packages and old R version (i.e., renv and pak might not be able to work out my trick for installing DHARMa).

@Jongmassey
Copy link

Thanks, @remlapmot that's super helpful. Having thought about it over the weekend I agree that monkey patching isn't a great idea. I'll check out upgrading renv to see what the performance is like, it sounds like we might have our solution there.

@Jongmassey
Copy link

to make launching RStudio easier, the RStudio URL is contained in the RSTUDIO_HTTP_REFERER env var:

e.g.

printenv | grep friendly
RSTUDIO_HTTP_REFERER=https://friendly-adventure-4qv9jqjj777fjw59-8787.app.github.dev/

maybe we could put a link to this in the bash MOTD or somewhere

@Jongmassey
Copy link

Working R/Rstudio config in this PR

We could have a unified R/Python dev image by adding in the OpenSAFELY Python packages as per this PR

Both tested in local devcontainers and somewhat on codespaces. We ideally would be docker building said unified image and publishing it under the opensafely-core org, and replacing the Jongmassey one referenced in the config in those PRs.

It's a bit chunky at 8GB so could certainly do with some optimisation, too

@lucyb lucyb self-assigned this Apr 12, 2024
@remlapmot
Copy link

Great work.

Not sure if I looked at the correct image - but maybe I save you approx. 2 GB - because of possible accidental duplication of the R packages.

If you're building from my image - I did that recreation without using renv, packages are installed into

  • /usr/local/lib/R/site-library - 422 packages, 2.0GB
  • /usr/local/lib/R/library - 16 packages, 39MB

If you then add in Simon's renv packages I think you'll additionally have

  • /renv/lib/R-4.0/x86_64-pc-linux-gnu - 424 packages, 2.3GB
  • And you'll also have /renv/sandbox/R-4.0/x86_64-pc-linux-gnu/9a444a72 with 14 symlinks to packages in /usr/lib/R/library - but these will be broken unless you've copied across those files also.

So if you build from my image you don't need Simon's packages; or if you use Simon's packages you don't need my image and you can build straight from rocker/rstudio:4.0.5.

lucyb added a commit to lucyb/test-devcontainers that referenced this issue Apr 15, 2024
This uses the standard Rocker image and saves use over 3 gig in space (see [comment](opensafely-core#9 (comment))).

It also moves the renv setup into a separate script, where we also setup the
rstudio environment.
lucyb added a commit to lucyb/test-devcontainers that referenced this issue Apr 15, 2024
This uses the standard Rocker image and saves use over 3 gig in space (see [comment](opensafely-core#9 (comment))).

It also moves the renv setup into a separate script, where we also setup the
rstudio environment.
lucyb added a commit to lucyb/test-devcontainers that referenced this issue Apr 15, 2024
This uses the standard Rocker image and saves use over 3 gig in space (see [comment](opensafely-core#9 (comment))).

It also moves the renv setup into a separate script, where we also setup the
rstudio environment.
lucyb added a commit to lucyb/test-devcontainers that referenced this issue Apr 15, 2024
This uses the standard Rocker image and saves use over 3 gig in space (see [comment](opensafely-core#9 (comment))).

It also moves the renv setup into a separate script, where we also setup the
rstudio environment.
lucyb added a commit to lucyb/test-devcontainers that referenced this issue Apr 15, 2024
This uses the standard Rocker image and saves use over 3 gig in space (see [comment](opensafely-core#9 (comment))).

It also moves the renv setup into a separate script, where we also setup the
rstudio environment.
@lucyb
Copy link

lucyb commented Apr 15, 2024

Thanks for that @remlapmot . Jon is away in sunny Canada this week, so I'm picking up the work. Your comment was very helpful this morning. I've brought together all of the changes into a draft PR here. You're welcome to have a look, but don't feel you have to.

I found that the renv wasn't setup correctly, so it wasn't picking up the packages from Simon's image. For the moment, I've moved it over to use your image. This works and will allow me to get some feedback on the changes so far.

For reference, using the rocker/rstudio:4.0.5 image and copying the packages over resulted in a image that was 6GB in size (although importantly it wasn't working). Whereas, basing it on your image results in a 6.8GB image.

@lucyb
Copy link

lucyb commented Apr 18, 2024

I'm closing this ticket and moving the discussion to #43, since we've finished "investigating" and now want to talk about rolling out the changes.

@lucyb lucyb closed this as completed Apr 18, 2024
@remlapmot
Copy link

I tried with the Codespace - the VSCode interface launched successfully, and I can access an R session from its Terminal, but I can't seem to get past the rstudio-server login screen. I tried with username: rstudio, password: yourpassword; as are usually the defaults but they failed.

Screenshot 2024-04-19 at 10 25 51

Maybe I did something wrong - there are some comments in the rocker/rstudio docs about disabling authentication https://rocker-project.org/images/versioned/rstudio.html#disable_auth. I tried setting export DISABLE_AUTH=true in the Codespace Terminal and restarting the rstudio-server session but that didn't do the trick.

@lucyb
Copy link

lucyb commented Apr 19, 2024

Ah sorry @remlapmot , I should have said, it's:

username: rstudio
password: rstudio

I didn't set it, so assumed that was the default somewhere.

@remlapmot
Copy link

Thanks - oh yes.

It works great.

@remlapmot
Copy link

Another possible addition for you to consider Lucy and Jon.

If a user prefers to use VS Code (rather than RStudio) to edit their R code they can add the R language extension to their Codespace.

https://marketplace.visualstudio.com/items?itemName=REditorSupport.r

To make that work it requires the languageserver R package - again the user can add that themselves.

(If a user was really proficient with the VS Code R extension they might additionally want the radian console and httpgd R package added - but again I guess they can do that themselves.)

So if you think helpful Lucy - you could add the R language extension to the devcontainer, and I could ask for the languageserver package to be added to the r image (as then I'd also add it to my rstudio image and the 2 would still be in sync).

I admit I don't have a good sense of how many users would prefer VS Code over RStudio.

@lucyb
Copy link

lucyb commented Apr 23, 2024

Thanks Tom, that's really helpful. From experience, I believe most people would prefer RStudio over VS Code for writing code, although I know a few of our users do use VS Code. I might have a go at installing the VS Code extension, but I'm currently keen to focus on getting the change out and getting some more feedback before doing too much more work. I also want to make it easier for us to roll out future changes too. Anyway, I'll get back to you if I do take a look.

lucyb added a commit to opensafely/research-template that referenced this issue Apr 23, 2024
This uses the standard Rocker image and saves use over 3 gig in space (see [comment](opensafely-core#9 (comment))).

It also moves the renv setup into a separate script, where we also setup the
rstudio environment.
lucyb added a commit to opensafely/research-template that referenced this issue Apr 24, 2024
This uses the standard Rocker image and saves use over 3 gig in space (see [comment](opensafely-core#9 (comment))).

It also moves the renv setup into a separate script, where we also setup the
rstudio environment.
@Jongmassey
Copy link

I've played around a bit with the R vscode extension, the languageserver package, radian and httpgd. All seemed to install with minimal drama

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
users Getting feedback or ideas from users
Projects
None yet
Development

No branches or pull requests

4 participants