Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maaslin3 #3643

Open
10 tasks done
WillNickols opened this issue Nov 3, 2024 · 10 comments
Open
10 tasks done

maaslin3 #3643

WillNickols opened this issue Nov 3, 2024 · 10 comments
Assignees
Labels
2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place OK

Comments

@WillNickols
Copy link

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @WillNickols

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: maaslin3
Title: "Refining and extending generalized multivariate linear models
        for meta-omic association discovery"
Year: 2024
Version: 0.99.0
Authors@R: c(
  person("William", "Nickols", email = "[email protected]", role = c("aut", "cre"), comment=c(ORCID="0000-0001-8214-9746")), 
  person("Jacob", "Nearing", email = "[email protected]", role = "aut")
  )
Depends: R (>= 4.4)
Description: MaAsLin 3 refines and extends generalized multivariate linear models for meta-omicron association discovery. It finds abundance and prevalence associations between microbiome meta-omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods (including support for multiple covariates, repeated measures, and ordered predictors), filtering, normalization, and transform options to customize analysis for your specific study.
License: MIT + file LICENSE
Imports: dplyr, pbapply, lmerTest, parallel, lme4, optparse, logging,
        multcomp, ggplot2, RColorBrewer, patchwork, scales,
        rlang, tibble, ggnewscale, survival, methods, BiocGenerics
Suggests: knitr, testthat (>= 2.1.0), rmarkdown, markdown, kableExtra
VignetteBuilder: knitr
Collate: fit.R utility_scripts.R viz.R maaslin3.R
URL: http://huttenhower.sph.harvard.edu/maaslin3
biocViews: Metagenomics, Software, Microbiome, Normalization, MultipleComparison
BugReports: https://github.com/biobakery/maaslin3/issues
NeedsCompilation: no
Packaged: 2024-07-16 15:01:54 UTC; williamnickols

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Nov 3, 2024
@lshep
Copy link
Contributor

lshep commented Nov 22, 2024

Please also provide an inst/scripts directory that describes how the data in inst/extdata was generated. It can be code, pseudo-code, or text but should minimally list any source or licensing information.

Do you have an approximation on how long this takes to build? I killed after 20 min. which would be above our time limit.

I also don't see any integration with existing Bioconductor class structures. See interoperability requirement. You should show interoperability with existing microbiome packages

@lshep lshep added 3e. pending pre-review changes review has indicated blocking concern that needs attention 3d. needs interop Package must explicitly use Bioconductor structures and methods labels Nov 22, 2024
@WillNickols
Copy link
Author

Hi! Thanks for the helpful suggestions. We've updated the following accordingly:

  1. We've added an inst/scripts directory that describes how the data in inst/extdata were generated and where the raw data can be found.
  2. We've moved out some files we mainly use for microbiome classes, so it's now taking 6 minutes to build on my computer and 8 in GitHub actions. Most of this time comes from building the vignettes since they demonstrate many non-overlapping analysis options included in the package.
  3. We've added in an option to run maaslin3, the main function that we expect almost all users will use, with a SummarizedExperiment input and explained how to do so in the documentation and vignette. This should allow MaAsLin 3 to take the same sort of inputs used by ANCOM-BC, a similar tool. Based on user feedback on our forum, we believe most users will use the base R options since they are typically reading in their taxonomic abundance tables and metadata from disk rather than from other BioC packages. Still, this change should be sufficient for anyone using a SummarizedExperiment input.

We've pushed a new version updated from 0.99.0 to 0.99.1. Please let us know if there's anything else we should do!

@lshep lshep added pre-check passed pre-review performed and ready to be added to git and removed 3e. pending pre-review changes review has indicated blocking concern that needs attention 3d. needs interop Package must explicitly use Bioconductor structures and methods labels Nov 27, 2024
@bioc-issue-bot
Copy link
Collaborator

Your package has been added to git.bioconductor.org to continue the
pre-review process. A build report will be posted shortly. Please
fix any ERROR and WARNING in the build report before a reviewer is
assigned or provide a justification on why you feel the ERROR or
WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. All changes should be
pushed to git.bioconductor.org moving forward. It is required to push a
version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Nov 27, 2024
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): maaslin3_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/maaslin3 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@lshep lshep added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean labels Dec 3, 2024
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package for an indepth review.
Please respond accordingly to any further comments from the reviewer.

@DarioS
Copy link

DarioS commented Dec 9, 2024

  • Don't include GitHub installation instructions but only recommend BioManager installation. It also does GitHub.
BiocManager::install("biobakery/maaslin3") # Update the value to "maaslin3" after acceptance and branch creation.

Also, mention that if user wants to compile vignette themselves, they should also specify dependencies = TRUE.

  • Use sapply or vapply instead of for loops. Please read Vectorize section of Developer's Guide. For example,
all_factors <- c()
for (col in colnames(dat_sub)) {
  if (!is.numeric(dat_sub[, col])) {
    if (is.factor(dat_sub[, col])) {
      factor_levels <- levels(dat_sub[, col])
    } else {
       factor_levels <- levels(factor(dat_sub[, col]))
    }
    # All factor levels except baseline
    all_factors <- c(all_factors, paste0(col, unique(factor_levels[-1])))
  }
}
  • Supports SummarizedExperiment as input, but not TreeSummarizedExperiment. TreeSummarizedExperiment allows for easy switching between different feature granularity (e.g. aggregate species to genus). This is a core class for microbiome data.
  • Lack of input parameter value checking. It should be the first task done within a function for parameters that the a value within a small set of valid values, such as model. See a match.arg tutorial for an example of how to use match.arg.
  • It is not clear whether the user should read manual or tutorial first. Please clarify somehow.
  • Bioconductor has its own data frame in S4Vectors named DataFrame. MaAsLin 3 can't handle it.
> library(S4Vectors)
> metadata <- as(metadata, "DataFrame")
> metadata[, 1:5]
DataFrame with 1527 rows and 5 columns
           participant_id      site_name  week_num     reads diagnosis
              <character>    <character> <integer> <integer>  <factor>
CSM5FZ3N_P          C3001   Cedars-Sinai         0   9961743        CD
CSM5FZ3R_P          C3001   Cedars-Sinai         2  16456391        CD
CSM5FZ3T_P          C3002   Cedars-Sinai         0  10511448        CD
CSM5FZ3V_P          C3001   Cedars-Sinai         6  17808965        CD
CSM5FZ3X_P          C3002   Cedars-Sinai         2  13160893        CD
...                   ...            ...       ...       ...       ...
PSMB4MBS            P6037 MGH Pediatrics        33  11589722        CD
PSMB4MC1            P6038 MGH Pediatrics        34  15565445        UC
PSMB4MC3            P6038 MGH Pediatrics        35   7319782        UC
PSMB4MC5            P6038 MGH Pediatrics        37   6913198        UC
PSMB4MC7            P6035 MGH Pediatrics        38   5297087        UC
> fit_out <- maaslin3(input_data = taxa_table, input_metadata = metadata, output = 'hmp2_output', ...
Error in maaslin_read_data(input_data, input_metadata, feature_specific_covariate,  : 
  input_metadata is neither a file nor a data frame!

Please improve the level of interoperability with core Bioconductor infrastructure.

  • MaAsLin 3's new functionality sounds a lot like LinDA. It would be nice to see a feature comparison table with other R packages designed for the same purpose. See Comparison to Existing for an example of such a table. What is the great new feature that makes MaAsLin 3 irresistible compared to alternatives? It is not clear to me.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: a8a702c661709bc70d1023b77c8e4461ecb37c1e

@WillNickols
Copy link
Author

Thanks for the helpful suggestions! We’ve made the following changes:

  1. Removed GitHub installation instructions and recommended BiocManager installation instead.
  2. Mentioned that if the user wants to compile vignettes themselves, they should specify dependencies = TRUE.
  3. Replaced all “for” loops with vapply except for those that repeatedly modify the same object.
  4. Added explicit support for TreeSummarizedExperiment as input. This actually worked as input before, but we’ve now made it explicit and added a test case in test_Maaslin3.R.
  5. Added match.arg checks for parameters with a small set of character options. There is also substantial parameter checking in the documented functions (those intended for users) either directly or with maaslin_check_arguments.
  6. Clarified on the README and vignettes that users should start with the tutorial vignette and then refer to the manual vignette as necessary.
  7. Allowed handling of Bioconductor’s own data frame in S4Vectors named DataFrame.
  8. Included a table in the tutorial, manual, and README documenting the features available in MaAsLin 3, along with the other popular microbiome differential abundance tools MaAsLin 2, ALDEx2, ANCOM-BC2, and LinDA. In addition, our new preprint (https://doi.org/10.1101/2024.12.13.628459) documents other improvements and benchmarking. This preprint is linked in the citation section of the tutorial, manual, and README.

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): maaslin3_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/maaslin3 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place OK
Projects
None yet
Development

No branches or pull requests

4 participants