Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out system dependencies of Bioconductor. #40

Merged
merged 6 commits into from
Jan 31, 2022
Merged

Conversation

nturaga
Copy link
Collaborator

@nturaga nturaga commented Jan 5, 2022

It is a step towards a better analysis of our system dependency structure in our docker images

  1. The PR separates the system dependencies required to install most Bioconductor packages into a file bioc_scripts/install_bioc_sysdeps.sh.

  2. It also reduces the number of layers within the Docker image, causing a size reduction of the biconductor_docker:devel from 3.99GB to 3.7GB.

  3. Create a folder called bioc_scripts which hosts the install.R (install BiocManager) script and the install_bioc_sysdeps.sh.

  4. Update the test-images.yaml workflow file to check the RELEASE_3_14 bioconductor_docker image instead of 3_13; this update should have been made during release (October) time.

Testing:

The built image of this PR is under nitesh1989/bioconductor_docker:devel.

1. The PR separates out the system dependencies required to install most Bioconductor packages into a file `bioc_scripts/install_bioc_sysdeps.sh`.

2. It also reduces the number of layers within the Docker image causing a size reduction of the `biconductor_docker:devel` from 3.99GB to 3.7GB.

3. Create a folder called `bioc_scripts` which hosts the install.R (install BiocManager) script and the `install_bioc_sysdeps.sh`.

4. Update the test-images.yaml workflow file to check the RELEASE_3_14 bioconductor_docker image instead of 3_13. This should have been done during release time.

Testing:

The built image of this PR is under `nitesh1989/bioconductor_docker:devel`.
@nturaga nturaga requested a review from jwokaty January 5, 2022 16:43
@nturaga nturaga self-assigned this Jan 5, 2022
@nturaga nturaga added the enhancement New feature or request label Jan 5, 2022
@nturaga nturaga requested a review from mtmorgan January 5, 2022 19:45
@nturaga
Copy link
Collaborator Author

nturaga commented Jan 12, 2022

Thanks @jwokaty.

I resolved the issues.

@jwokaty
Copy link
Contributor

jwokaty commented Jan 12, 2022

I'm still testing the docker. When I finish that, I'll complete my review. I just need a little more time.

Copy link
Contributor

@jwokaty jwokaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest if possible to document the CRAN or Bioc packages requiring a particular APT or Python package that way when the package is removed or when the package no longer needs that APT or Python package, you can spot it easily. This doesn't necessarily need to live in the script but could be in something like a NEWS or changes file. I would document the Docker-specific ones, since we've already documented the ones in the BBS.

I think some formatting issues were introduced such as https://github.com/Bioconductor/bioconductor_docker/pull/40/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R39-R41 and https://github.com/Bioconductor/bioconductor_docker/pull/40/files#diff-2d1ea27c76ebaad5b7e5a3d127128007970253d3789de4bcf830949604a43cd9R15.

I would recommend looking at some of my comments in my PR that didn't get merged, particularly around and after #32 (comment). I compared packages in Rocker and also at that time commented out packages that had no documented reason, added them back to see if they allowed additional Bioc packages to be installed. I think you might be able to remove more APT packages.

I recommend clearer comments/groupings. For example, https://github.com/Bioconductor/bioconductor_docker/pull/40/files#diff-2d1ea27c76ebaad5b7e5a3d127128007970253d3789de4bcf830949604a43cd9R116 after removing libmariadb-dev-compat, which I believe will conflict with the earlier installed mysql library necessitating separate installations, this section might be better commented/grouped as "Image libraries" rather than "More additional resources" (and I might move other image libraries there). I also don't think you need libmariadb-dev-compat.

After testing, I was unable to install the following packages, which I also didn't find in the list you provided.

 [1] "basecallQC"             "canceR"                 "CONFESS"               
 [4] "cytomapper"             "gpuMagic"               "granulator"            
 [7] "imcRtools"              "infinityFlow"           "lisaClust"             
[10] "MsBackendRawFileReader" "networkBMA"             "NormalyzerDE"          
[13] "OpenStats"              "rawrr"                  "schex"                 
[16] "scTensor"               "scTGIF"                 "SNPhood"               
[19] "sojourner"              "spatialHeatmap"         "spicyR"                
[22] "Travel"  

@nturaga
Copy link
Collaborator Author

nturaga commented Jan 14, 2022

Inline replies, but first, thanks for the review @jwokaty.

I would suggest if possible to document the CRAN or Bioc packages requiring a particular APT or Python package that way when the package is removed or when the package no longer needs that APT or Python package, you can spot it easily. This doesn't necessarily need to live in the script but could be in something like a NEWS or changes file. I would document the Docker-specific ones, since we've already documented the ones in the BBS.

I will try to document this. Would you like to pair on this aspect of it and meet briefly?

I think some formatting issues were introduced such as https://github.com/Bioconductor/bioconductor_docker/pull/40/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R39-R41 and https://github.com/Bioconductor/bioconductor_docker/pull/40/files#diff-2d1ea27c76ebaad5b7e5a3d127128007970253d3789de4bcf830949604a43cd9R15.

I will fix formatting issues. Thanks for catching them.

I would recommend looking at some of my comments in my PR that didn't get merged, particularly around and after #32 (comment). I compared packages in Rocker and also at that time commented out packages that had no documented reason, added them back to see if they allowed additional Bioc packages to be installed. I think you might be able to remove more APT packages.

I recommend clearer comments/groupings. For example, https://github.com/Bioconductor/bioconductor_docker/pull/40/files#diff-2d1ea27c76ebaad5b7e5a3d127128007970253d3789de4bcf830949604a43cd9R116 after removing libmariadb-dev-compat, which I believe will conflict with the earlier installed mysql library necessitating separate installations, this section might be better commented/grouped as "Image libraries" rather than "More additional resources" (and I might move other image libraries there). I also don't think you need libmariadb-dev-compat.

Some packages listed below won't install, because there is no tensorflow in this image. I'm figuring out why each of these packages fails, and will get specifics.

After testing, I was unable to install the following packages, which I also didn't find in the list you provided.

 [1] "basecallQC"             "canceR"                 "CONFESS"               
 [4] "cytomapper"             "gpuMagic"               "granulator"            
 [7] "imcRtools"              "infinityFlow"           "lisaClust"             
[10] "MsBackendRawFileReader" "networkBMA"             "NormalyzerDE"          
[13] "OpenStats"              "rawrr"                  "schex"                 
[16] "scTensor"               "scTGIF"                 "SNPhood"               
[19] "sojourner"              "spatialHeatmap"         "spicyR"                
[22] "Travel"  

I also want to clarify the goals of this PR, it's really just to put the dependencies in a seperate file. Before progressing towards a more complete image which installs more of the devel packages.

@jwokaty
Copy link
Contributor

jwokaty commented Jan 14, 2022

About the packages I couldn't install, I think they are in agreement with the list you gave me. (They were not in the list.) So this isn't an issue and I just put them there as reference.

@nturaga
Copy link
Collaborator Author

nturaga commented Jan 14, 2022

  • libavfilter-dev - <infinityFlow, host of other packages>
  • mono-runtime - <rawrr, MsBackendRawFileReader>
  • libfuse-dev -
  • ocl-icd-opencl-dev - - but machine needs to be a GPU--otherwise it's useless

I was able to reduce that list from what you had to

[1] "canceR"     "networkBMA" "SNPhood"   

@nturaga nturaga removed the request for review from mtmorgan January 22, 2022 19:42
@nturaga
Copy link
Collaborator Author

nturaga commented Jan 29, 2022

I believe this PR is ready to be merged. Couple of points in favor of merging,

  1. If we include other changes, we run the risk of making it too large a PR.
  2. This PR does what it was intended to do - separate out system dependencies.

I recommend adding another PR to make a list of sys libraries <-> bioc package relation. And also figure out https://gist.github.com/nturaga/154fb17dd401435e769c4b444e95b532, which compares the linux builder to the docker image as of today.

I am in favor of CHANGELOG.md or better "commit" messages which we discussed but need to figure out a format for them.

@nturaga nturaga merged commit b92c946 into master Jan 31, 2022
@nturaga nturaga deleted the install_bioc_deps branch January 31, 2022 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants