Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R packages don't exactly match those in the OpenSAFELY R action image #22

Closed
Jongmassey opened this issue May 15, 2024 · 2 comments
Closed

Comments

@Jongmassey
Copy link
Contributor

There are three extra R packages installed in this image compared to what's in the OpenSAFELY R action image: littler, docopt, and spatial

littler and docopt are explicitly added in the Rocker Dockerfile (although not in an easy machine-parseable way) and are system/CLI integration packages which a research user is unlikely to make use of. Spatial is a bit of an oddity as it's in the R recommended packages so don't know why it's not in the OpenSAFELY R action image. It's a package that provides statistical/analytical functions so there's a small risk a user might write an analytical R script that makes use of this then find that it doesn't work via opensafely run .

I'd earlier misdiagnosed discrepancy between what's in the OpenSAFELY R action image and what's in the image we build for codespaces. The aspects of this relevant to testing are in this PR comment, but in summary:
Using the R venv config file as an authoritative source of what's installed in the OpenSAFELY R action image was a bad idea. Instead, I propose we use the packages.csv file in the opensafely-r-docker repo instead. This latter file is used during the build process as the source of R packages to be installed into the renv. For packages which are already installed (by virtue of being part of r-base etc.) then they are not added to the renv config.

I think we just leave all three of these discrepant packages in place, and see if any users get tripped up by this.
For the purposes of "testing if the R packages have been installed correctly", it also makes it a lot simpler - we can just exclude these three from the comparison. If any more discrepancies arrive in the future, we want the test to fail as it's something we should be aware of.

@StevenMaude
Copy link
Contributor

StevenMaude commented May 15, 2024

I was going to add this to the PR thread in #13, but it's probably more appropriate here.

This may have already been covered by the discussion in #17. In any case, some of the packages are installed twice. This is the diff of the OpenSAFELY R Docker image packages.csv with a similar CSV created from R in a codespace:

diff --git a/packages.csv b/packages.csv
index 59c1e63..cf4360a 100644
--- a/packages.csv
+++ b/packages.csv
@@ -68,6 +68,7 @@
 "DHARMa","0.4.6"
 "digest","0.6.25"
 "distributional","0.2.0"
+"docopt","0.7.1"
 "doParallel","1.0.15"
 "dplyr","1.0.2"
 "DT","0.15"
@@ -199,6 +200,7 @@
 "lcmm","2.0.0"
 "lifecycle","1.0.3"
 "listenv","0.8.0"
+"littler","0.3.12"
 "lme4","1.1-23"
 "lmtest","0.9-37"
 "locfit","1.5-9.4"
@@ -424,16 +426,31 @@
 "zip","2.1.1"
 "zoo","1.8-8"
 "base","4.0.5"
+"boot","1.3-27"
+"class","7.3-18"
+"cluster","2.1.1"
+"codetools","0.2-18"
 "compiler","4.0.5"
 "datasets","4.0.5"
+"foreign","0.8-81"
 "graphics","4.0.5"
 "grDevices","4.0.5"
 "grid","4.0.5"
+"KernSmooth","2.23-18"
+"lattice","0.20-41"
+"MASS","7.3-53.1"
+"Matrix","1.3-2"
 "methods","4.0.5"
+"mgcv","1.8-34"
+"nlme","3.1-152"
+"nnet","7.3-15"
 "parallel","4.0.5"
+"rpart","4.1-15"
+"spatial","7.3-13"
 "splines","4.0.5"
 "stats","4.0.5"
 "stats4","4.0.5"
+"survival","3.2-10"
 "tcltk","4.0.5"
 "tools","4.0.5"
 "utils","4.0.5"

Out of these, three are the packages that were already mentioned in #13 as known to be in the Rocker image, but not the OpenSAFELY R image (docopt, littler, spatial).

The rest that correspond to additional lines are installed twice, once by the Rocker image, and once by the OpenSAFELY R Docker image. In most cases — except for lattice and rpart — these are slightly different versions, which I guess could potentially be a problem.

However, I'm guessing that we're saved by the library path configuration:

> .libPaths()
[1] "/usr/local/lib/R/site-library" "/usr/local/lib/R/library"

so that the OpenSAFELY R packages copied over take precedence over the Rocker packages.

But, if we could remove the duplicates, we could also more cleanly compare the installed versions of the packages, instead of just the package names. (Or at least be guaranteed that the installed packages are installed in the correct place.)

@lucyb
Copy link
Contributor

lucyb commented May 21, 2024

Testing this out in a dev container, I ran the following R code:

> packageVersion("KernSmooth")
[1] ‘2.23.17’
> packageVersion("foreign")
[1] ‘0.8.80’
> packageVersion("lattice")
[1] ‘0.20.41’
> packageVersion("survival")
[1] ‘3.2.3’

These correspond to the versions currently found in the r-docker packages.csv.

Therefore, it seems as though the correct versions of packages are being loaded and unless it causes significant problems with testing we shouldn't do anything to fix this right now. Although, we should continue to remind researchers and co-pilots to use opensafely run and opensafely exec to test their code rather than rely solely on the development environment being correct.

@lucyb lucyb closed this as completed May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants