Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELViS #3631

Open
10 tasks done
JYLeeBioinfo opened this issue Oct 30, 2024 · 20 comments
Open
10 tasks done

ELViS #3631

JYLeeBioinfo opened this issue Oct 30, 2024 · 20 comments
Assignees
Labels
2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place OK

Comments

@JYLeeBioinfo
Copy link

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @JYLeeBioinfo

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: ELViS
Title: An R Package for Estimating Copy Number Levels of Viral Genome Segments Using Base-Resolution Read Depth Profile
Version: 0.99.0
Authors@R: c(
    person("Hyo Young", "Choi", , "[email protected]", role = c("aut", "cph"),
 comment = c(ORCID = "0000-0002-7627-8493")),
    person("Jin-Young", "Lee", , "[email protected]", role = c("aut", "cre", "cph"),
 comment = c(ORCID = "0000-0002-5366-7488")),
    person("Xiaobei", "Zhao", , "[email protected]", role = "ctb",
 comment = c(ORCID = "0000-0002-5277-0846")),
    person("Jeremiah R.", "Holt", , "[email protected]", role = "ctb",
 comment = c(ORCID = "0000-0002-5201-5015")),
    person("Katherine A.", "Hoadley", , "[email protected]", role = "aut",
 comment = c(ORCID = "0000-0002-1216-477X")),
    person("D. Neil", "Hayes", , "[email protected]", role = c("aut", "fnd", "cph"),
 comment = c(ORCID = "0000-0001-6203-7771"))
  )
Description: Base-resolution copy number analysis of viral genome. Utilizes base-resolution read depth data over viral genome to find copy number segments with two-dimensional segmentation approach. Provides publish-ready figures, including histograms of read depths, coverage line plots over viral genome annotated with copy number change events and viral genes, and heatmaps showing multiple types of data with integrative clustering of samples.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
VignetteBuilder: knitr
biocViews: CopyNumberVariation, Coverage, GenomicVariation, BiomedicalInformatics, Sequencing, Normalization, Visualization, Clustering
LazyData: false
BugReports: https://github.com/hyochoi/ELViS/issues
URL: https://github.com/hyochoi/ELViS
Config/testthat/edition: 3
Imports: 
    circlize,
    ComplexHeatmap,
    data.table,
    dplyr,
    GenomicFeatures,
    GenomicRanges,
    ggplot2,
    glue,
    graphics,
    grDevices,
    igraph,
    knitr,
    magrittr,
    memoise,
    parallel,
    patchwork,
    reticulate,
    rmarkdown,
    scales,
    segclust2d,
    stats,
    stringr,
    txdbmaker,
    utils,
    uuid,
    zoo
Depends: R (>= 4.3)
Suggests: 
    Rsamtools,
    BiocManager,
    testthat (>= 3.0.0)

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Oct 30, 2024
@lshep
Copy link
Contributor

lshep commented Nov 21, 2024

asset is not a standard top level directory. What is this and what does it
contain?

Bioconductor recommends the use of basilisk over reticulate. Please update.

Vignettes should use a tempdir() for executing their examples and to save any
files not a home directory.

vignettes/ELViSPrecisely_Toy_Example.Rmd:68:analysis_dir = "~/ELViS"
vignettes/ELViSPrecisely_Toy_Example.Rmd:206:saveRDS(base_resol_depth,"~/base_resol_depth.rds")

@lshep lshep added the 3e. pending pre-review changes review has indicated blocking concern that needs attention label Nov 21, 2024
@JYLeeBioinfo
Copy link
Author

Hi Ishep,
I appreciate you reaching out to me to check on these issues.

  1. asset directory
  • asset directory contains code snippets that I used during package development, but it should be excluded when distributing this package.
  • I'll remove this
  1. reticulate
  • Thank you for introducing a great package as basilisk! I'll rework the code to use basilisk.
  1. analysis_dir
  • I'll change this from "~/ELViS" to tempdir().

Thank you again for kindly informing me of the adjustments needed.
If there's anything that needs correction, please feel free to let me know.

Thank you,
Jay

@JYLeeBioinfo
Copy link
Author

I have finished 1,3 but it will take some time for "2. reticulate -> basilisk".
I'll let you know right after 2. is finished.

@JYLeeBioinfo
Copy link
Author

I completely replaced reticulate with basilisk in Process_Bam.R. (2.)
Also, reticulate is not in the import section of the DESCRIPTION file anymore.

Thank you again for guiding us to make our package comply with the Bioconductor recommendations.

@lshep
Copy link
Contributor

lshep commented Nov 27, 2024

Please also provide an inst/scripts directory that describes how the data
in inst/extdata was generated. It can be code, pseudo-code, or text but
should minimally list any source or licensing information.

You still have a reference in vignette to

tmpdir="./tmpdir"
dir.create(tmpdir,recursive = TRUE)

which will create a tmpdir persistent directory in the current directory.
Please use tempdir() or tempfile()

> system.time({
    mtrx_samtools_reticulate__example = 
        get_depth_matrix(
            bam_files = bam_files,target_virus_name = target_virus_name
            ,mode = "samtools_basilisk"
            ,N_cores = N_cores
            ,min_mapq = 30
            ,tmpdir=tmpdir
            ,condaenv = "env_samtools"
            ,condaenv_samtools_version="1.21"
        )
})
The path to samtools not provided.
Default samtools is used : /home/lorikern/.cache/R/basilisk/1.19.0/ELViS/0.99.0/env_samtools/bin/samtools
The path to samtools not provided.
Default samtools is used : /home/lorikern/.cache/R/basilisk/1.19.0/ELViS/0.99.0/env_samtools/bin/samtools
   user  system elapsed 
  0.013   0.029   0.060 
Warning message:
In dir.create(tmpdir) : './tmpdir' already exists

It also seems like your function tries to create the directory and produces a
warning that it already exists. Likely in your function you need to check if it
exists and only run creation if it doesn't.

@lshep lshep added pre-check passed pre-review performed and ready to be added to git and removed 3e. pending pre-review changes review has indicated blocking concern that needs attention labels Nov 27, 2024
@bioc-issue-bot
Copy link
Collaborator

Your package has been added to git.bioconductor.org to continue the
pre-review process. A build report will be posted shortly. Please
fix any ERROR and WARNING in the build report before a reviewer is
assigned or provide a justification on why you feel the ERROR or
WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. All changes should be
pushed to git.bioconductor.org moving forward. It is required to push a
version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Nov 27, 2024
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): ELViS_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/ELViS to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@JYLeeBioinfo
Copy link
Author

JYLeeBioinfo commented Nov 27, 2024

Hi @lshep,
Thank you again for letting us know the points to improve.

1. tmpdir

  1. I changed this code in the example of the function get_depth_matrix
    from this
tmpdir="./tmpdir"
dir.create(tmpdir,recursive = TRUE)

to the following code.

tmpdir <- tempdir()
  1. In the same function, I changed this code
    from this
dir.create(tmpdir)

to the following code

if(!dir.exists(tmpdir)){ dir.create(tmpdir) }
  1. inst/scripts
    Following your comments, I added a file names README_extdata.txt to inst/scripts directory.

Thank you again for helping us to improve this package!
Jay

p.s.
I'll fix the errors in the build report and push the repository again.
Actually, we checked this package with check() and BiocCheck() both on our server and on github workflow, but BiocGenerics and IRange had not been needed to be listed in the Import section of DESCRIPTION file. But I'll try adding them to Import and see if errors are removed.

@JYLeeBioinfo
Copy link
Author

JYLeeBioinfo commented Nov 28, 2024

Hi @lshep,
I reflected changes I mentioned in the earlier comment and fixed errors that appeared on the build report.
And then I pushed to git.bioconductor.org:packages/ELViS.git

image

But, bioc-issue-bot doesn't seem to respond...

Am I doing something wrong..?

@lshep
Copy link
Contributor

lshep commented Nov 29, 2024

Did you do a valid version bump to 0.99.1? Bioconductor will only recognize changes with a valid version bump

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 6f70f192eeb16411ad90df26fd0bc6bed72039cc

@JYLeeBioinfo
Copy link
Author

Thank you for pointing out the version bump issue. I corrected the version and it seems bot is responding well.

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): ELViS_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/ELViS to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: cf2ef205f9af4cee057976ff1525098fce6c45f3

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): ELViS_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/ELViS to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added OK and removed WARNINGS labels Dec 5, 2024
@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: e3036cab51e968ef67f22eb15a55f9e79bbbe9a4

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): ELViS_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/ELViS to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@lshep lshep added the 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place label Dec 10, 2024
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package for an indepth review.
Please respond accordingly to any further comments from the reviewer.

@bioc-issue-bot bioc-issue-bot removed the pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean label Dec 10, 2024
@jianhong
Copy link

Package 'ELViS' Review

Thank you for submitting your package to Bioconductor. The package passed check and build. However there are several things need to be fixed. Please try to answer the comments line by line when you are ready for a second review.
Code: Note: please consider; Important: must be addressed.

The NAMESPACE file

  • Important: Selective imports using importFrom instead of import all with import.

    • in line 15 import(ComplexHeatmap)
    • in line 18 import(RBGL, except=c(transitivity,bfs,dfs))
    • in line 19 import(basilisk)
    • in line 20 import(circlize, except=c(degree))
    • in line 25 import(glue, except=c(trim))
    • in line 27 import(knitr)
    • in line 29 import(memoise)
    • in line 30 import(parallel)
    • in line 31 import(patchwork)
    • in line 32 import(rmarkdown)
    • in line 33 import(scales)
    • in line 34 import(segclust2d)
    • in line 35 import(stringr)
    • in line 36 import(uuid)
    • in line 37 import(zoo, except=c(index,yearmon,yearqtr,"index<-"))
  • Important: Function names use camelCase or snake_case and do not include ..

    • in line 11 export(norm.fun)

General package development

R code

  • Important:is() or inherits() instead of class().
    • In file R/main.R:
      • at line 1132 found ' if (sum(class(variable) %in% c("double","numeric")) > 0) {'
      • at line 1146 found ' if (sum(class(variable) %in% "factor") == 0) variable <- factor(variable)'
      • at line 1155 found ' if (sum(class(variable) %in% c("ordered")) > 0) {'
  • NOTE: expose overwrite parameter.
    • In file R/main.R:
      • at line 807 found ' ,overwrite=FALSE'
  • NOTE: :: is not suggested in source code unless you can make sure all the packages are imported. Some people think it is better to keep ::. However, please be aware that you will need to manually double-check the imported items if you make any changes to the DESCRIPTION file during development. My suggestion is to remove one or two repetitions to trigger the dependency check.
  • NOTE: Vectorize: for loops present, try to replace them by *apply funcitons.
    • In file R/main.R:
      • at line 200 found ' for (sam in seq_len(n) ) {'
      • at line 502 found ' for (sam in which(segupdated_data$segment.K==0)) {'
      • at line 674 found ' for (js in seq_len(nseg)) {'
      • at line 754 found ' for (sam in seq_len(ncol(X))) {'
      • at line 1284 found ' for (js in seq_len(nseg)) {'
      • at line 1384 found ' for( k_tmp in i_grouped_lst[order(vapply(i_grouped_lst,min,0))]){'
      • at line 1442 found ' for (clique in ordered_cliques) {'
      • at line 1515 found ' for (js in state) {'
    • In file R/misc.R:
      • at line 224 found ' for(pkg in c("nord","wesanderson","RColorBrewer")){'
    • In file R/Plotting.R:
      • at line 62 found ' for(ovlp in ovlp_status %>% dplyr::group_split(.data$queryHits)){'
    • In file R/Process_Bam.R:
      • at line 301 found ' for (part in parts) {'
      • at line 331 found ' for(s in all_strings){ sanity_check(s) }'
  • Important: Use file.path to replace paste
    • In file R/Process_Bam.R:
      • at line 112 found ' PATH = paste(env_dir,"bin",sep="/"),'
      • at line 113 found ' LD_LIBRARY_PATH = paste(env_dir,"lib",sep="/")'
      • at line 579 found ' PATH = paste(env_dir,"bin",sep="/"),'
      • at line 580 found ' LD_LIBRARY_PATH = paste(env_dir,"lib",sep="/")'
  • Important: Remove unused code.
    • In file R/Plotting.R:
      • at line 199 found ' theme(legend.position = "none") #+ xlim(0,NROW(mtrx_for_plotting)+1)'
  • Important: Need explanation the reason to use head or tail in following code.
    • In file R/main.R:
      • at line 1185 found ' rollYquant <- c(rep(rollYquant0[1],half.width), rollYquant0, rep(tail(rollYquant0,1),half.width-1))'
  • NOTE: Try to check the edge condition when using seq.int or seq_len. For example using seq.int(min(5, nrow(data))) to replace seq.int(5)
    • In file R/main.R:
      • at line 1393 found ' segtable2[nseg2,seq_len(3)] <- segment.table[nseg2,seq_len(3)]'
  • NOTE: Functional programming: code repetition.
    • repetition in coord_to_lst, get_depth_matrix_Rsamtools_gp, and get_depth_samtools_gp
      • in coord_to_lst
        • line 1:{
        • line 2: coord_lst <- coord %>% str_replace_all(",", "") %>% str_split(":|-",
        • line 3: simplify = TRUE) %>% as.list() %>% structure(names = c("chr",
        • line 4: "start", "end")) %>% within({
        • line 5: start <- as.numeric(start)
        • line 6: end <- as.numeric(end)
        • line 7: })
      • in get_depth_matrix_Rsamtools_gp
        • line 2: min_mapq = 30, min_base_quality = 0)
        • line 3: {
        • line 4: coord_lst <- coord %>% str_replace_all(",", "") %>% str_split(":|-",
        • line 5: simplify = TRUE) %>% as.list() %>% structure(names = c("chr",
        • line 6: "start", "end")) %>% within({
        • line 7: start <- as.numeric(start)
        • line 8: end <- as.numeric(end)
        • line 9: })
      • in get_depth_samtools_gp
        • line 2: bash_script_base, samtools = NULL)
        • line 3:{
        • line 4: coord_lst <- coord %>% str_replace_all(",", "") %>% str_split(":|-",
        • line 5: simplify = TRUE) %>% as.list() %>% structure(names = c("chr",
        • line 6: "start", "end")) %>% within({
        • line 7: start <- as.numeric(start)
        • line 8: end <- as.numeric(end)
        • line 9: })
    • repetition in detect_bp__update_ref__sub_updateY, and get_BPsegment_v2
      • in detect_bp__update_ref__sub_updateY
        • line 12: BPs.sam <- length(BPs)
        • line 13: if (BPs.sam > 0) {
        • line 14: if (BPs.sam%%2 > 0) {
        • line 15: BPs.sam <- BPs.sam + 1
        • line 16: newBP <- c(seq_along(Ydiff_sam)[-BPs])[which.max(abs(Ydiff_sam[-BPs]))]
        • line 17: BPs <- c(BPs, newBP)
        • line 18: }
        • line 19: nseg <- length(BPs)
        • line 20: nseg2 <- nseg + 1
        • line 21: srt.BPs <- sort(BPs)
        • line 22: segment.table <- data.frame(seg = seq_len(nseg2),
        • line 23: begin = c(1, srt.BPs + 1), end = c(srt.BPs, d))
        • line 24: segbps <- lapply(seq_len(nseg), FUN = function(i) segment.table$begin[i]:segment.table$end[i])
        • line 25: segbps[[1]] <- c(segbps[[1]], segment.table$begin[nseg2]:segment.table$end[nseg2])
        • line 26: Y.tmp <- Y
        • line 27: Q1.zscore <- max.zscore <- mean.zscore.outwinreg <- rep(1000,
        • line 28: nseg)
        • line 29: med.outwinreg <- len.outwinreg <- rep(0, nseg)
        • line 30: names(mean.zscore.outwinreg) <- seq_len(nseg)
        • line 31: for (js in seq_len(nseg)) {
        • line 32: outwinreg <- segbps[[js]]
        • line 33: len.outwinreg[js] <- length(outwinreg)
        • line 34: newmed <- median(Y[outwinreg, sam])
        • line 35: med.outwinreg[js] <- newmed
        • line 36: if (newmed > 0.05) {
        • line 37: Y.tmp[, sam] <- Y[, sam]/newmed
        • line 38: Z.tmp <- t(apply(Y.tmp, 1, function(x) pd.rate.hy(x,
        • line 39: qrsc = TRUE)))
        • line 40: mean.zscore.outwinreg[js] <- mean(abs(Z.tmp[outwinreg,
        • line 41: sam]))
        • line 42: Q1.zscore[js] <- quantile(abs(Z.tmp[, sam]),
        • line 43: probs = 0.25)
        • line 44: max.zscore[js] <- max(abs(Z.tmp[, sam]))
        • line 45: }
        • line 46: }
        • line 47: ireg <- which((max.zscore < 30) & (len.outwinreg >
        • line 48: 500))
        • line 49: if (length(ireg) > 0) {
        • line 50: baseseg <- ireg[which.min(Q1.zscore[ireg])]
        • line 51: }
        • line 52: else {
        • line 53: ireg <- which(max.zscore < 30)
        • line 54: baseseg <- ireg[which.min(Q1.zscore[ireg])]
        • line 55: }
        • line 56: baseseg
      • in get_BPsegment_v2
        • line 29: BPs.sam <- length(BPs)
        • line 30: if (BPs.sam > 0) {
        • line 31: if (BPs.sam%%2 > 0) {
        • line 32: BPs.sam <- BPs.sam + 1
        • line 33: newBP <- c(c(seq_len(length(ydiff)))[-BPs])[which.max(abs(ydiff[-BPs]))]
        • line 34: BPs <- c(BPs, newBP)
        • line 35: }
        • line 37: nseg <- length(BPs)
        • line 38: nseg2 <- nseg + 1
        • line 39: srt.BPs <- sort(BPs)
        • line 40: segment.table <- data.frame(seg = seq_len(nseg2), begin = c(1,
        • line 41: srt.BPs + 1), end = c(srt.BPs, d))
        • line 42: segbps <- lapply(seq_len(nseg), FUN = function(i) segment.table$begin[i]:segment.table$end[i])
        • line 43: segbps[[1]] <- c(segbps[[1]], segment.table$begin[nseg2]:segment.table$end[nseg2])
        • line 44: Y.tmp <- Y
        • line 45: Q1.zscore <- max.zscore <- mean.zscore.outwinreg <- rep(1000,
        • line 46: nseg)
        • line 47: med.outwinreg <- len.outwinreg <- rep(0, nseg)
        • line 48: names(mean.zscore.outwinreg) <- seq_len(nseg)
        • line 49: for (js in seq_len(nseg)) {
        • line 50: outwinreg <- segbps[[js]]
        • line 51: len.outwinreg[js] <- length(outwinreg)
        • line 52: newmed <- median(Y[outwinreg, sam])
        • line 53: med.outwinreg[js] <- newmed
        • line 54: if (newmed > 0.05) {
        • line 55: Y.tmp[, sam] <- Y[, sam]/newmed
        • line 56: Z.tmp <- t(apply(Y.tmp, 1, function(x) pd.rate.hy(x,
        • line 57: qrsc = TRUE)))
        • line 58: mean.zscore.outwinreg[js] <- mean(abs(Z.tmp[outwinreg,
        • line 59: sam]))
        • line 60: Q1.zscore[js] <- quantile(abs(Z.tmp[, sam]),
        • line 61: probs = 0.25)
        • line 62: max.zscore[js] <- max(abs(Z.tmp[, sam]))
        • line 63: }
        • line 64: }
        • line 65: ireg <- which((max.zscore < 30) & (len.outwinreg > 500))
        • line 66: if (length(ireg) > 0) {
        • line 67: baseseg <- ireg[which.min(Q1.zscore[ireg])]
        • line 68: }
        • line 69: else {
        • line 70: ireg <- which(max.zscore < 30)
        • line 71: baseseg <- ireg[which.min(Q1.zscore[ireg])]
        • line 72: }
    • repetition in finalize_segments_and_clusters, get_segments_and_clusters, and update_reference_segments
      • in finalize_segments_and_clusters
        • line 13: if (K > 1) {
        • line 14: clust_seg <- segclust(testdata, lmin = 300, Kmax = 10,
        • line 15: ncluster = (2:K), seg.var = c("z"), scale.variable = FALSE,
        • line 16: subsample_by = 60)
        • line 17: rescued_data$clust.list[[sam]] <- clust_seg
      • in get_segments_and_clusters
        • line 14: output <- tryCatch({
        • line 15: K <- segment.K_initial[sam]
        • line 16: if (K > 1) {
        • line 17: shift_seg <- segmentation(testdata, lmin = 300,
        • line 18: Kmax = 10, seg.var = c("z", "y"), subsample_by = 60,
        • line 19: scale.variable = FALSE)
        • line 20: K <- shift_seg$Kopt.lavielle
        • line 21: if (K > 1) {
        • line 22: clust_seg <- segclust(testdata, lmin = 300,
        • line 23: Kmax = 10, ncluster = (2:K), seg.var = c("z",
        • line 24: "y"), scale.variable = FALSE, subsample_by = 60)
        • line 25: out <- list(K = K, clust = clust_seg)
        • line 33: }
        • line 34: msg <- paste0(sam, "| done")
        • line 35: message(msg)
        • line 36: out
        • line 37: }, error = function(err) {
        • line 38: msg <- paste0(sam, "|", err)
        • line 39: message(msg)
        • line 41: if (K > 1) {
        • line 42: shift_seg <- segmentation(testdata, lmin = 300,
        • line 43: Kmax = 10, seg.var = c("z"), subsample_by = 60,
        • line 44: scale.variable = FALSE)
        • line 45: K <- shift_seg$Kopt.lavielle
        • line 46: if (K > 1) {
        • line 47: clust_seg <- segclust(testdata, lmin = 300,
        • line 48: Kmax = 10, ncluster = (2:K), seg.var = c("z"),
        • line 49: scale.variable = FALSE, subsample_by = 60)
        • line 50: out <- list(K = K, clust = clust_seg)
        • line 58: }
        • line 59: msg <- paste0(sam, "| done")
        • line 60: message(msg)
        • line 61: out
        • line 62: })
        • line 63: return(output)
      • in update_reference_segments
        • line 17: output <- tryCatch({
        • line 18: shift_seg <- segmentation(testdata, lmin = 300,
        • line 19: Kmax = 10, seg.var = c("z", "y"), subsample_by = 60,
        • line 20: scale.variable = FALSE)
        • line 21: K <- shift_seg$Kopt.lavielle
        • line 22: msg <- paste0(sam, "| done")
        • line 23: message(msg)
        • line 24: K
        • line 25: }, error = function(err) {
        • line 26: msg <- paste0(sam, "|", err)
        • line 27: message(msg)
        • line 29: shift_seg <- segmentation(testdata, lmin = 300,
        • line 30: Kmax = 10, seg.var = c("z"), subsample_by = 60,
        • line 31: scale.variable = FALSE)
        • line 32: K <- shift_seg$Kopt.lavielle
        • line 33: msg <- paste0(sam, "| done")
        • line 34: message(msg)
        • line 35: K
        • line 36: })
        • line 37: return(output)
        • line 46: output <- tryCatch({
        • line 47: K <- segment.K_initial[sam]
        • line 48: if (K > 1) {
        • line 49: clust_seg <- segclust(testdata, lmin = 300,
        • line 50: Kmax = 10, ncluster = (2:K), seg.var = c("z",
        • line 51: "y"), scale.variable = FALSE, subsample_by = 60)
        • line 52: result <- segment(clust_seg)
        • line 59: }
        • line 60: msg <- paste0(sam, "| done")
        • line 61: message(msg)
        • line 62: new_y
        • line 63: }, error = function(err) {
        • line 64: msg <- paste0(sam, "|", err)
        • line 65: message(msg)
        • line 67: K <- segment.K_initial[sam]
        • line 68: if (K > 1) {
        • line 69: clust_seg <- segclust(testdata, lmin = 300,
        • line 70: Kmax = 10, ncluster = (2:K), seg.var = c("z"),
        • line 71: scale.variable = FALSE, subsample_by = 60)
        • line 72: result <- segment(clust_seg)
        • line 85: }
        • line 86: msg <- paste0(sam, "| done")
        • line 87: message(msg)
        • line 88: new_y
        • line 89: })
        • line 90: return(output)
    • repetition in gene_cn_heatmaps, and get_gene_rnt_ori
      • in gene_cn_heatmaps
        • line 9: txdb <- makeTxDbFromGFF(gff3_fn, format = "gff3")
        • line 10: genes <- genes(txdb) %>% sort
        • line 11: cds <- cdsBy(txdb, by = "gene")
        • line 12: if (!is.null(exclude_genes)) {
        • line 13: genes <- genes[!(genes$gene_id %in% exclude_genes)]
      • in get_gene_rnt_ori
        • line 4: txdb <- makeTxDbFromGFF(gff3_fn, format = "gff3")
        • line 5: genes <- genes(txdb) %>% sort
        • line 6: cds <- cdsBy(txdb, by = "gene")
        • line 7: if (!is.null(exclude_genes)) {
        • line 8: cds <- cds[!(names(cds) %in% exclude_genes)]
    • repetition in gene_cn_heatmaps, integrative_heatmap, and plot_pileUp_multisample
      • in gene_cn_heatmaps
        • line 3: "in"))
        • line 4:{
        • line 5: if (length(baseline) == 1) {
        • line 6: baseline <- rep(baseline, NCOL(X_raw))
        • line 7: }
        • line 8: baseline_target <- baseline
      • in integrative_heatmap
        • line 23: }
        • line 24: if (length(baseline) == 1) {
        • line 25: baseline <- rep(baseline, NCOL(X_raw))
        • line 26: }
        • line 27: baseline_target <- baseline
      • in plot_pileUp_multisample
        • line 6: {
        • line 7: if (length(baseline) == 1) {
        • line 8: baseline <- rep(baseline, NCOL(X_raw))
        • line 9: }
    • repetition in get_depth_matrix, and get_depth_matrix_gp
      • in get_depth_matrix
        • line 1: mode = "samtools_basilisk", target_virus_name, N_cores = detectCores(),
        • line 2: min_mapq = 30, min_base_quality = 0, max_depth = 1e+05, modules = NULL,
        • line 3: envs = NULL, tmpdir = tempdir(), samtools = NULL, condaenv = "env_samtools",
        • line 4: condaenv_samtools_version = "1.21")
        • line 5:{
        • line 6: os_name <- Sys.info()["sysname"]
        • line 7: if (os_name == "Windows") {
        • line 8: if (mode != "Rsamtools") {
        • line 9: warning(glue("mode={mode} is not supported for {os_name}. Changing mode to Rsamtools..."))
        • line 10: mode <- "Rsamtools"
        • line 11: }
        • line 12: }
        • line 16: }
        • line 17: if (mode == "samtools_basilisk") {
        • line 18: if (!requireNamespace("basilisk", quietly = TRUE)) {
        • line 19: stop("R Package 'basilisk' does not exist. Please install it by following instructions in 'https://www.bioconductor.org/packages/release/bioc/html/basilisk.html'")
        • line 20: }
        • line 21: if (grepl("[^0-9.]", condaenv_samtools_version)) {
        • line 22: stop("Invalid samtools version number. Please find correct version number refering to 'https://anaconda.org/bioconda/samtools'.")
        • line 23: }
        • line 24: samtools_env <- BasiliskEnvironment(envname = condaenv,
        • line 25: pkgname = "ELViS", channels = c("conda-forge", "bioconda"),
        • line 26: packages = c(glue("samtools=={condaenv_samtools_version}")))
        • line 27: env_dir <- obtainEnvironmentPath(samtools_env)
        • line 28: envs <- c(PATH = paste(env_dir, "bin", sep = "/"), LD_LIBRARY_PATH = paste(env_dir,
        • line 29: "lib", sep = "/"))
        • line 30: }
        • line 31: if (mode == "Rsamtools") {
        • line 32: if (!requireNamespace("Rsamtools", quietly = TRUE)) {
        • line 33: stop("R Package 'Rsamtools' does not exist. Please install it by executing following command.\n\nif (!requireNamespace('BiocManager', quietly = TRUE))\n utils::install.packages('BiocManager')\n\nBiocManager::install('Rsamtools')")
        • line 34: }
        • line 35: out_mtrx <- get_depth_matrix_Rsamtools(bam_files = bam_files,
        • line 36: target_virus_name = target_virus_name, N_cores = N_cores,
        • line 37: max_depth = max_depth, min_mapq = min_mapq, min_base_quality = min_base_quality)
        • line 38: }
        • line 39: else if (mode %in% c("samtools_custom", "samtools_basilisk")) {
        • line 40: if (!dir.exists(tmpdir)) {
        • line 41: dir.create(tmpdir)
        • line 42: }
        • line 43: out_mtrx <- get_depth_matrix_samtools(bam_files = bam_files,
        • line 48: }
        • line 49: colnames(out_mtrx) <- basename(bam_files)
        • line 50: if (grepl("Error", out_mtrx[1, 1], ignore.case = TRUE)) {
        • line 51: stop(out_mtrx[1, 1])
        • line 52: }
        • line 53: return(out_mtrx)
      • in get_depth_matrix_gp
        • line 1: mode = "samtools_custom", coord, N_cores = detectCores(),
        • line 2: min_mapq = 30, min_base_quality = 0, max_depth = 1e+05, modules = NULL,
        • line 3: envs = NULL, tmpdir = tempdir(), samtools = NULL, condaenv = "env_samtools",
        • line 4: condaenv_samtools_version = "1.21")
        • line 5:{
        • line 6: os_name <- Sys.info()["sysname"]
        • line 7: if (os_name == "Windows") {
        • line 8: if (mode != "Rsamtools") {
        • line 9: warning(glue("mode={mode} is not supported for {os_name}. Changing mode to Rsamtools..."))
        • line 10: mode <- "Rsamtools"
        • line 11: }
        • line 12: }
        • line 13: if (mode == "samtools_basilisk") {
        • line 14: if (!requireNamespace("basilisk", quietly = TRUE)) {
        • line 15: stop("R Package 'basilisk' does not exist. Please install it by following instructions in 'https://www.bioconductor.org/packages/release/bioc/html/basilisk.html'")
        • line 16: }
        • line 17: if (grepl("[^0-9.]", condaenv_samtools_version)) {
        • line 18: stop("Invalid samtools version number. Please find correct version number refering to 'https://anaconda.org/bioconda/samtools'.")
        • line 19: }
        • line 20: samtools_env <- BasiliskEnvironment(envname = condaenv,
        • line 21: pkgname = "ELViS", channels = c("conda-forge", "bioconda"),
        • line 22: packages = c(glue("samtools=={condaenv_samtools_version}")))
        • line 23: env_dir <- obtainEnvironmentPath(samtools_env)
        • line 24: envs <- c(PATH = paste(env_dir, "bin", sep = "/"), LD_LIBRARY_PATH = paste(env_dir,
        • line 25: "lib", sep = "/"))
        • line 26: }
        • line 27: if (mode == "Rsamtools") {
        • line 28: if (!requireNamespace("Rsamtools", quietly = TRUE)) {
        • line 29: stop("R Package 'Rsamtools' does not exist. Please install it by executing following command.\n\nif (!requireNamespace('BiocManager', quietly = TRUE))\n utils::install.packages('BiocManager')\n\nBiocManager::install('Rsamtools')")
        • line 30: }
        • line 31: out_mtrx <- get_depth_matrix_Rsamtools_gp(bam_files = bam_files,
        • line 32: coord = coord, N_cores = N_cores, max_depth = max_depth,
        • line 33: min_mapq = min_mapq, min_base_quality = min_base_quality)
        • line 34: }
        • line 35: else if (mode %in% c("samtools_custom", "samtools_basilisk")) {
        • line 36: if (!dir.exists(tmpdir)) {
        • line 37: dir.create(tmpdir)
        • line 38: }
        • line 39: out_mtrx <- get_depth_matrix_samtools_gp(bam_files = bam_files,
        • line 43: }
        • line 44: colnames(out_mtrx) <- basename(bam_files)
        • line 45: if (grepl("Error", out_mtrx[1, 1], ignore.case = TRUE)) {
        • line 46: stop(out_mtrx[1, 1])
        • line 47: }
        • line 48: return(out_mtrx)
    • repetition in get_depth_matrix_Rsamtools, and get_depth_matrix_Rsamtools_gp
      • in get_depth_matrix_Rsamtools
        • line 7: target_grng <- data.frame(chr = target_virus_name, start = 1,
        • line 8: end = virus_genome_size) %>% makeGRangesFromDataFrame()
        • line 9: paramScanBam <- Rsamtools::ScanBamParam(which = target_grng)
        • line 10: paramPileup <- Rsamtools::PileupParam(min_base_quality = min_base_quality,
        • line 11: max_depth = max_depth, min_mapq = min_mapq, min_nucleotide_depth = 0,
        • line 12: distinguish_strands = FALSE, distinguish_nucleotides = FALSE,
        • line 13: ignore_query_Ns = FALSE, include_deletions = FALSE,
        • line 14: include_insertions = FALSE, left_bins = NULL, query_bins = NULL,
        • line 15: cycle_bins = NULL)
        • line 16: depth_mtrx <- mclapply(X = bam_files, mc.cores = N_cores,
        • line 17: FUN = get_depth_Rsamtools, scanBamParam = paramScanBam,
        • line 18: pileupParam = paramPileup) %>% do.call(cbind, .)
        • line 19: return(depth_mtrx)
      • in get_depth_matrix_Rsamtools_gp
        • line 10: target_grng <- data.frame(chr = coord_lst$chr, start = coord_lst$start,
        • line 11: end = coord_lst$end) %>% makeGRangesFromDataFrame()
        • line 12: paramScanBam <- Rsamtools::ScanBamParam(which = target_grng)
        • line 13: paramPileup <- Rsamtools::PileupParam(min_base_quality = min_base_quality,
        • line 14: max_depth = max_depth, min_mapq = min_mapq, min_nucleotide_depth = 0,
        • line 15: distinguish_strands = FALSE, distinguish_nucleotides = FALSE,
        • line 16: ignore_query_Ns = FALSE, include_deletions = FALSE,
        • line 17: include_insertions = FALSE, left_bins = NULL, query_bins = NULL,
        • line 18: cycle_bins = NULL)
        • line 19: depth_mtrx <- mclapply(X = bam_files, mc.cores = N_cores,
        • line 20: FUN = get_depth_Rsamtools, scanBamParam = paramScanBam,
        • line 21: pileupParam = paramPileup) %>% do.call(cbind, .)
        • line 22: return(depth_mtrx)
    • repetition in get_depth_matrix_samtools, and get_depth_matrix_samtools_gp
      • in get_depth_matrix_samtools
        • line 1: function (bam_files, target_virus_name, N_cores = min(10,
        • line 2: detectCores()), min_mapq = 30, min_base_quality = 0,
        • line 3: modules = NULL, envs = NULL, tmpdir = tempdir(), samtools = NULL)
        • line 4: {
      • in get_depth_matrix_samtools_gp
        • line 1: function (bam_files, coord, N_cores = detectCores(), min_mapq = 30,
        • line 2: min_base_quality = 0, modules = NULL, envs = NULL, tmpdir = tempdir(),
        • line 3: samtools = NULL)
        • line 4: {
    • repetition in get_depth_samtools, and get_depth_samtools_gp
      • in get_depth_samtools
        • line 5: depth_bed_fn <- glue("{tmpdir}/{UUIDgenerate()}_{vec_i}.depth.bed")
        • line 6: run_samtools(bash_script_base = bash_script_base, command = glue("depth -a -r '{target_virus_name}' --min-MQ {min_mapq} --min-BQ {min_base_quality} -g 256 {bam_fn}"),
        • line 7: output_name = depth_bed_fn, samtools = samtools, depth_count_only = TRUE)
        • line 8: depth_bed <- unlist(read.csv(depth_bed_fn, sep = "\t", header = FALSE),
        • line 9: use.names = FALSE)
        • line 10: sanity_check(depth_bed_fn)
        • line 11: system2(command = "rm", args = depth_bed_fn)
        • line 12: return(depth_bed)
      • in get_depth_samtools_gp
        • line 11: depth_bed_fn <- glue("{tmpdir}/{UUIDgenerate()}_{vec_i}.depth.bed")
        • line 12: run_samtools(bash_script_base, command <- glue("depth -a -r '{coord_lst$chr}:{coord_lst$start}-{coord_lst$end}' --min-MQ {min_mapq} --min-BQ {min_base_quality} -g 256 {bam_fn}"),
        • line 13: output_name <- depth_bed_fn, samtools <- samtools, depth_count_only <- TRUE)
        • line 14: depth_bed <- unlist(read.csv(depth_bed_fn, sep = "\t", header = FALSE),
        • line 15: use.names = FALSE)
        • line 16: sanity_check(depth_bed_fn)
        • line 17: system2(command = "rm", args = depth_bed_fn)
        • line 18: return(depth_bed)
    • repetition in get_gene_anno_plot, and get_gene_rnt
      • in get_gene_anno_plot
        • line 3:{
        • line 4: mc <- match.call()
        • line 5: encl <- parent.env(environment())
        • line 6: called_args <- as.list(mc)[-1]
        • line 7: default_args <- encl$_default_args
        • line 8: default_args <- default_args[setdiff(names(default_args),
        • line 9: names(called_args))]
        • line 10: called_args[encl$_omit_args] <- NULL
        • line 11: args <- c(lapply(called_args, eval, parent.frame()), lapply(default_args,
        • line 12: eval, envir = environment()))
        • line 13: key <- encl$_hash(c(encl$_f_hash, args, lapply(encl$_additional,
        • line 14: function(x) eval(x[[2L]], environment(x)))))
        • line 15: res <- encl$_cache$get(key)
        • line 16: if (inherits(res, "key_missing")) {
        • line 17: mc[[1L]] <- encl$_f
        • line 18: res <- withVisible(eval(mc, parent.frame()))
        • line 19: encl$_cache$set(key, res)
        • line 20: }
        • line 21: if (res$visible) {
        • line 22: res$value
        • line 23: }
        • line 24: else {
        • line 25: invisible(res$value)
        • line 26: }
        • line 27:}, memoised = TRUE, class = c("memoised", "function"))), where = "namespace:ELViS",
      • in get_gene_rnt
        • line 3:{
        • line 4: mc <- match.call()
        • line 5: encl <- parent.env(environment())
        • line 6: called_args <- as.list(mc)[-1]
        • line 7: default_args <- encl$_default_args
        • line 8: default_args <- default_args[setdiff(names(default_args),
        • line 9: names(called_args))]
        • line 10: called_args[encl$_omit_args] <- NULL
        • line 11: args <- c(lapply(called_args, eval, parent.frame()), lapply(default_args,
        • line 12: eval, envir = environment()))
        • line 13: key <- encl$_hash(c(encl$_f_hash, args, lapply(encl$_additional,
        • line 14: function(x) eval(x[[2L]], environment(x)))))
        • line 15: res <- encl$_cache$get(key)
        • line 16: if (inherits(res, "key_missing")) {
        • line 17: mc[[1L]] <- encl$_f
        • line 18: res <- withVisible(eval(mc, parent.frame()))
        • line 19: encl$_cache$set(key, res)
        • line 20: }
        • line 21: if (res$visible) {
        • line 22: res$value
        • line 23: }
        • line 24: else {
        • line 25: invisible(res$value)
        • line 26: }
        • line 27:}, memoised = TRUE, class = c("memoised", "function"))), where = "namespace:ELViS",
    • repetition in get_gene_anno_plot_ori, and get_gene_rnt_ori
      • in get_gene_anno_plot_ori
        • line 67: is_custom_palette <- FALSE
        • line 68: if (length(col_pal) >= length(Gene_levels)) {
        • line 69: is_custom_palette <- TRUE
        • line 70: if (is.null(names(col_pal)) || length(setdiff(cds_plotdata$Gene,
        • line 71: names(col_pal))) != 0) {
        • line 72: col_pal_fin <- structure(col_pal[seq_len(length(Gene_levels))],
        • line 73: names = Gene_levels)
        • line 74: }
        • line 75: else {
        • line 76: col_pal_fin <- col_pal
        • line 77: }
        • line 78: }
      • in get_gene_rnt_ori
        • line 12: is_custom_palette <- FALSE
        • line 13: if (length(col_pal_gene) >= length(Gene_levels)) {
        • line 14: is_custom_palette <- TRUE
        • line 15: if (is.null(names(col_pal_gene)) || length(setdiff(Gene_levels,
        • line 16: names(col_pal_gene))) != 0) {
        • line 17: col_pal_fin <- structure(col_pal_gene[seq_len(length(Gene_levels))],
        • line 18: names = Gene_levels)
        • line 19: }
        • line 20: else {
        • line 21: col_pal_fin <- col_pal_gene
        • line 22: }
        • line 23: }
    • repetition in get_window_v1, and get_window2
      • in get_window_v1
        • line 5: win <- matrix(c(1, rep(ips, each = 2), dim(Y)[1]), ncol = 2,
        • line 6: byrow = TRUE)
        • line 7: win <- win[which((win[, 2] - win[, 1]) > 10), ]
        • line 8: return(win)
      • in get_window2
        • line 5: 0)
        • line 6: win <- matrix(c(1, rep(ips, each = 2), d), ncol = 2, byrow = TRUE)
        • line 7: win <- win[which((win[, 2] - win[, 1]) > min.length), ]
        • line 8: return(win)

Documentation

  • Note: Consider to include a package man page.
  • Important: Consider to include a readme file for all extdata.
  • Note: Vignette should use BiocStyle package for formatting.
    • rmd file vignettes/ELViSPrecisely_Toy_Example.Rmd
  • Important: Vignette should have an Introduction section. Please move the install paragraph to a new section.
    • rmd file vignettes/ELViSPrecisely_Toy_Example.Rmd
  • Important: In sample code, tmpdir should be output of tempdir.
    • rmd file vignettes/ELViSPrecisely_Toy_Example.Rmd
  • Important: Remove the section about installation from github.
    • rmd file vignettes/ELViSPrecisely_Toy_Example.Rmd
  • Note: Vignette includes motivation for submitting to Bioconductor as part of the abstract/intro of the main vignette.
    • rmd file vignettes/ELViSPrecisely_Toy_Example.Rmd
  • Important: Ensure the news file is updated to the latest version available.
  • Note: typos:
WORD FOUND IN.
annotaiton plot_pileUp_multisample.Rd:41
FInal ELViS_toy_run_result.Rd:13
Indecate get_new_baseline.Rd:12
modulefile get_depth_matrix.Rd:38
THe run_ELViS.Rd:17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place OK
Projects
None yet
Development

No branches or pull requests

4 participants