Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site2Target #3666

Open
9 of 10 tasks
peymanzarrineh opened this issue Nov 28, 2024 · 24 comments
Open
9 of 10 tasks

Site2Target #3666

peymanzarrineh opened this issue Nov 28, 2024 · 24 comments
Assignees
Labels
2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place OK

Comments

@peymanzarrineh
Copy link

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @peymanzarrineh

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: Site2Target
Type: Package
Title: An R package to associate peaks and target genes
Version: 0.99.0
Authors@R: person("Peyman Zarrineh", email="[email protected]", role=c("cre", "aut"), comment = c(ORCID = "0000-0003-4820-4101"))
Description: Statistics implemented for both peak-wise and gene-wise associations. In peak-wise associations, the p-value of the target genes of a given set of peaks are calculated. Negative binomial or Poisson distributions can be used for modeling the unweighted peaks targets and log-nromal can be used to model the weighted peaks. In gene-wise associations a table consisting of a set of genes, mapped to specific peaks, is generated using the given rules.
BugReports: https://github.com/fls-bioinformatics-core/Site2Target/issues
License: GPL-2
Encoding: UTF-8
LazyData: false
Imports: S4Vectors, stats, utils, BiocGenerics, GenomeInfoDb, MASS, IRanges, GenomicRanges
biocViews: Annotation, ChIPSeq, Software, Epigenetics, GeneExpression, GeneTarget
RoxygenNote: 7.3.2
Suggests: BiocStyle, knitr, rmarkdown
VignetteBuilder: knitr

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Nov 28, 2024
@lshep
Copy link
Contributor

lshep commented Dec 10, 2024

When i try to R CMD build your package I'm getting the following ERROR

R CMD build Site2Target 
* checking for file 'Site2Target/DESCRIPTION' ... OK
* preparing 'Site2Target':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘Site2Target.Rmd’ using rmarkdown
Could not find bibliography file: /Site2Target.bib
Error running filter /usr/bin/pandoc-citeproc:
Filter returned error status 1
Error: processing vignette 'Site2Target.Rmd' failed with diagnostics:
pandoc document conversion failed with error 83
--- failed re-building ‘Site2Target.Rmd’

SUMMARY: processing the following file failed:
  ‘Site2Target.Rmd’

Error: Vignette re-building failed.
Execution halted

@peymanzarrineh
Copy link
Author

Hello,

I got this error but I assumed it is about pandoc versioning. I could make .html file without any problem from .Rmd file. Do you know what is the problem?

Thank you,

@lshep
Copy link
Contributor

lshep commented Dec 10, 2024

I think you can just do bibliography: Site2Target.bib instead of bibliography: "`r file.path(system.file('vignettes', package = 'Site2Target'), 'Site2Target.bib')`". If that does not work I would ask on the [email protected]

@peymanzarrineh
Copy link
Author

Thank you very much! that solved the problem. Now I don't get error or warning in check and BiocCheck. now my problem is that I cannot push a new version to the Bioconductor. I do:

git remote add origin https://github.com/fls-bioinformatics-core/Site2Target.git
git remote add upstream [email protected]:packages/Site2Target.git
git remote -v
git fetch --all

"error: Could not fetch upstream"

It seems " [email protected]:packages/Site2Target" does not exists. Should I make a new issue and start from beginning?

@lshep
Copy link
Contributor

lshep commented Dec 11, 2024

it is not yet on bioconductor. this is still in the preview stage. I will check your github repo for the updates soon

@lshep
Copy link
Contributor

lshep commented Dec 18, 2024

I'll move this forward to building but please correct the following before an indepth review

Please don't use exportPattern("^[[:alpha:]]+") you should be selectively
importing and exporting. Please provide a complete NAMESPACE

Please also provide an inst/scripts directory that describes how the data
in inst/extdata was generated. It can be code, pseudo-code, or text but
should minimally list any source or licensing information.

@lshep lshep added the pre-check passed pre-review performed and ready to be added to git label Dec 18, 2024
@peymanzarrineh
Copy link
Author

Thank you for your reply. Strangely for my previous package the export part has been taken care by Roxygen but not for this one. I will do it.
Just my problem is with the second part that was not asked in my previous package as well. So I just made small datasets of our publicly available data on GEO which the paper will be appeared soon (almost accepted) and a publicly available HiC data. Just I reduced the files to Chr21 which is a very small chromosome. I do not have much code to share. Can you give me some example of inst/script? or may be readme file which explains data?

@lshep
Copy link
Contributor

lshep commented Dec 18, 2024

If you put the proper import / importFrom / export lines in documentation, maybe you just need to run devtools::document() to have the new NAMESPACE generate?

Even just a text describing what you say here, a reference that its from GEO, so we know it can be distributed. Ideally if someone wanted to generate their own they know where to get started

@bioc-issue-bot
Copy link
Collaborator

Your package has been added to git.bioconductor.org to continue the
pre-review process. A build report will be posted shortly. Please
fix any ERROR and WARNING in the build report before a reviewer is
assigned or provide a justification on why you feel the ERROR or
WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. All changes should be
pushed to git.bioconductor.org moving forward. It is required to push a
version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Dec 18, 2024
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): Site2Target_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/Site2Target to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@peymanzarrineh
Copy link
Author

Thank you! I fixed your notes and the error which was the gitignore thing.
Now I have problem with pushing the newversion

I did
git remote add origin https://github.com/fls-bioinformatics-core/Site2Target.git

git remote add upstream [email protected]:packages/Site2Target.git

git remote -v

git fetch --all

I see:
origin https://github.com/fls-bioinformatics-core/Site2Target.git (fetch)
origin https://github.com/fls-bioinformatics-core/Site2Target.git (push)
upstream [email protected]:packages/Site2Target.git (fetch)
upstream [email protected]:packages/Site2Target.git (push)

which is correct but the rest of things do not work

git merge upstream/mster
git merge origin/main
git push upstream master

Can you help with this? What is the problem what should I write?
Thank you

@lshep
Copy link
Contributor

lshep commented Dec 20, 2024

Bioconductor does not have a main or master branch. Bioconductor uses devel as the main branch. Please see http://contributions.bioconductor.org/git-version-control.html#new-package-workflow step 5 explains pushes to a branch that has a different name

@peymanzarrineh
Copy link
Author

Thank you! but I am still stuck. I think it is different from my previous package. Can you help me:

This is right:

git remote -v
origin https://github.com/fls-bioinformatics-core/Site2Target.git (fetch)
origin https://github.com/fls-bioinformatics-core/Site2Target.git (push)
upstream [email protected]:packages/Site2Target.git (fetch)
upstream [email protected]:packages/Site2Target.git (push)

This seems right:

git fetch --all
Fetching origin
Fetching upstream

From here everything does not work:

git merge upstream/devel
fatal: refusing to merge unrelated histories

git merge upstream/master
merge: upstream/master - not something we can merge

git push upstream main:devel
error: src refspec main does not match any
error: failed to push some refs to '[email protected]:packages/Site2Target.git'

git push origin main
error: src refspec main does not match any
error: failed to push some refs to 'https://github.com/fls-bioinformatics-core/Site2Target.git'

@lshep
Copy link
Contributor

lshep commented Dec 23, 2024

It looks like your default branch is named origin? May I assume the local branch you are on matches that? git branch if it is called origin than the command would be git push upstream origin:devel

@peymanzarrineh
Copy link
Author

Thank you very much! When I go to Site2Target github it has only one branch and it is called Origin. Whatever I do I get error

git branch

  • master

git push upstream origin:devel
error: src refspec origin does not match any
error: failed to push some refs to '[email protected]:packages/Site2Target.git'

git push upstream master:devel
To git.bioconductor.org:packages/Site2Target.git
! [rejected] master -> devel (non-fast-forward)
error: failed to push some refs to '[email protected]:packages/Site2Target.git'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

@lshep
Copy link
Contributor

lshep commented Dec 24, 2024

Can you do as it suggests and do a git pull and git pull upstream

@peymanzarrineh
Copy link
Author

Thank you! I am still stuck.
git remote -v
origin https://github.com/fls-bioinformatics-core/Site2Target.git (fetch)
origin https://github.com/fls-bioinformatics-core/Site2Target.git (push)
upstream [email protected]:packages/Site2Target (fetch)
upstream [email protected]:packages/Site2Target (push)

git push upstream devel
Everything up-to-date

git branch

  • devel
    master

git pull
Already up-to-date.

git pull upstream
Already up-to-date.

git push upstream origin:devel
error: src refspec origin does not match any
error: failed to push some refs to 'git.bioconductor.org:packages/Site2Target'

@lshep
Copy link
Contributor

lshep commented Dec 24, 2024

per your output there, you are on a branch called devel. Whatever branch it says your on you should use in your call.

@peymanzarrineh
Copy link
Author

I do not have devel on my local (origin) repository. It just have one branch. The devel one is on the upstream. Something is wrong and i was trying to correct it to do: git push upstream origin:devel

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 972876c8e5c20524025dde329a406ee29195a181

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 24.04.1 LTS): Site2Target_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/Site2Target to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added OK and removed ERROR labels Dec 24, 2024
@lshep lshep added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean labels Jan 2, 2025
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package for an indepth review.
Please respond accordingly to any further comments from the reviewer.

@jianhong
Copy link

jianhong commented Jan 2, 2025

Package 'Site2Target' Review

Thank you for submitting your package to Bioconductor. The package passed check and build. However there are several things need to be fixed. Please try to answer the comments line by line when you are ready for a second review.
Code: Note: please consider; Important: must be addressed.

The DESCRIPTION file

  • Important: Depends field is not found in DESCRIPTION.
  • Important: R version is not clear in DESCRIPTION.

General package development

R code

  • NOTE: :: is not suggested in source code unless you can make sure all the packages are imported. Some people think it is better to keep ::. However, please be aware that you will need to manually double-check the imported items if you make any changes to the DESCRIPTION file during development. My suggestion is to remove one or two repetitions to trigger the dependency check.
  • Important: 1:n is not suggested in source code. Use seq_along, seq_len or seq.int instead.
    • In file R/peakwiseAssociations.R:
      • at line 222 found ' acceptedInds <- c(1:geneNumber)'
      • at line 433 found ' acceptedInds <- c(1:geneNumber)'
    • In file R/utils.R:
      • at line 168 found ' chrs <- tmp[(c(1:len)*2-1)]'
      • at line 169 found ' Ranges <- tmp[(c(1:len)*2)]'
      • at line 171 found ' start <- tmp[(c(1:len)*2-1)]'
      • at line 172 found ' end <- tmp[(c(1:len)*2)]'
  • NOTE: Vectorize: for loops present, try to replace them by *apply funcitons.
    • In file R/genewiseAssociations.R:
      • at line 21 found ' for(i in seq_len(queryLen))'
      • at line 304 found ' for(i in seq_len(commonStrandsNumber))'
      • at line 322 found ' for(j in seq_len(currentGeneNumber))'
      • at line 341 found ' for(j in seq_len(currentGeneNumber))'
      • at line 574 found ' for(i in seq_len(mapNumber))'
      • at line 626 found ' for(i in seq_len(mapNumber))'
      • at line 837 found ' for(i in seq_len(commonStrandsNumber))'
      • at line 854 found ' for(j in seq_len(currentPeaklen))'
      • at line 866 found ' for(j in seq_len(currentGeneslen))'
    • In file R/peakwiseAssociations.R:
      • at line 55 found ' for(i in seq_len(geneNumber))'
      • at line 396 found ' for(i in seq_len(geneNumber))'
      • at line 563 found ' for(i in seq_len(overlapsNumber))'
      • at line 744 found ' for(i in seq_len(overlapsNumber))'
      • at line 778 found ' for(i in seq_len(geneNumber))'
      • at line 794 found ' for(siteCounter in seq_len(tmpSiteNumber))'
  • Important: finish TODO list.
    • In file R/genewiseAssociations.R:
      • at line 261 found ' # Remove interactions lower than distance ############# <----- This can become a function'
  • Important: Please consider to add drop=FALSE/TRUE to avoid/secure the reduction of dimension for matrices and arrays. Ignore this if using datatable.
    • In file R/genewiseAssociations.R:
      • at line 384 found ' df <- df[tmpInds, ]'
    • In file R/utils.R:
      • at line 53 found ' ranges=IRanges::IRanges(Table[,startInd], Table[,endInd]))'
  • NOTE: Functional programming: code repetition.
    • repetition in addColumn2geneWiseAssociation, and addRelation2geneWiseAssociation
      • in addColumn2geneWiseAssociation
        • line 1: function (type = "", name = NULL, coordinates = NULL, columnName = NA,
        • line 2: column, inFile = "geneWiseAssociation", outFile = "geneWiseAssociation")
        • line 3: {
        • line 4: if (is.na(columnName)) {
        • line 5: stop("Column name should be provided")
        • line 6: }
        • line 7: else {
        • line 8: columnName <- removeReserveCharacter(columnName)
        • line 9: }
        • line 10: if (!dir.exists(inFile)) {
        • line 11: stop("The user provided directory does not exist")
        • line 12: }
        • line 13: if (!file.exists(file.path(inFile, "link.tsv"))) {
        • line 14: stop("The gene-peaak link file does not exist in the directory")
        • line 15: }
        • line 16: interactionTable <- utils::read.table(file.path(inFile,
        • line 17: "link.tsv"), header = TRUE, sep = "\t")
        • line 18: if (!file.exists(file.path(inFile, "gene.tsv"))) {
        • line 19: stop("The gene information file does not exist in the directory")
        • line 20: }
        • line 21: geneTable <- utils::read.table(file.path(inFile, "gene.tsv"),
        • line 22: header = TRUE, sep = "\t")
        • line 23: if (!file.exists(file.path(inFile, "peak.tsv"))) {
        • line 24: stop("The peaak file does not exist in the directory")
        • line 25: }
        • line 26: peakTable <- utils::read.table(file.path(inFile, "peak.tsv"),
        • line 27: header = TRUE, sep = "\t")
        • line 114: if (!(dir.exists(outFile))) {
        • line 115: dir.create(outFile)
        • line 116: }
        • line 117: utils::write.table(interactionTable, file = file.path(outFile,
        • line 118: "link.tsv"), row.names = FALSE, col.names = TRUE,
        • line 119: quote = FALSE, sep = "\t")
      • in addRelation2geneWiseAssociation
        • line 1: function (strand1 = NULL, strand2 = NULL, columnName, column,
        • line 2: inFile = "geneWiseAssociation", outFile = "geneWiseAssociation")
        • line 3: {
        • line 4: if (is.na(columnName)) {
        • line 5: stop("Column nam should be provided")
        • line 6: }
        • line 7: else {
        • line 8: columnName <- removeReserveCharacter(columnName)
        • line 9: }
        • line 10: if (!dir.exists(inFile)) {
        • line 11: stop("The user provided directory does not exist")
        • line 12: }
        • line 13: if (!file.exists(file.path(inFile, "link.tsv"))) {
        • line 14: stop("The gene-peaak link file does not exist in the directory")
        • line 15: }
        • line 16: interactionTable <- utils::read.table(file.path(inFile,
        • line 17: "link.tsv"), header = TRUE, sep = "\t")
        • line 18: if (!file.exists(file.path(inFile, "gene.tsv"))) {
        • line 19: stop("The gene information file does not exist in the directory")
        • line 20: }
        • line 21: geneTable <- utils::read.table(file.path(inFile, "gene.tsv"),
        • line 22: header = TRUE, sep = "\t")
        • line 25: if (!file.exists(file.path(inFile, "peak.tsv"))) {
        • line 26: stop("The peaak file does not exist in the directory")
        • line 27: }
        • line 28: peakTable <- utils::read.table(file.path(inFile, "peak.tsv"),
        • line 29: header = TRUE, sep = "\t")
        • line 92: if (!(dir.exists(outFile))) {
        • line 93: dir.create(outFile)
        • line 94: }
        • line 95: utils::write.table(interactionTable, file = file.path(outFile,
        • line 96: "link.tsv"), row.names = FALSE, col.names = TRUE,
        • line 97: quote = FALSE, sep = "\t")
    • repetition in addRelation2geneWiseAssociation, extendSitesInGivenRegions, genewiseAssociation, getTargetGenesNumber, getTargetGenesPvals, getTargetGenesPvalsWithDNAInteractions, getTargetGenesPvalsWithIntensities, genewiseAssociation, and getTargetGenesPvalsWithIntensitiesAndDNAInteractions
      • in addRelation2geneWiseAssociation
        • line 46: mapPeak <- GenomicRanges::findOverlaps(peakCoord,
        • line 47: strand1)
        • line 48: mapPeakInds <- S4Vectors::queryHits(mapPeak)
        • line 49: mapPeakStrandInds <- S4Vectors::subjectHits(mapPeak)
        • line 50: mapGene <- GenomicRanges::findOverlaps(geneCoord,
        • line 51: strand2)
        • line 52: mapGeneInds <- S4Vectors::queryHits(mapGene)
        • line 53: mapGeneStrandInds <- S4Vectors::subjectHits(mapGene)
        • line 54: commonStrands <- intersect(mapPeakStrandInds, mapGeneStrandInds)
        • line 55: commonStrandsNumber <- length(commonStrands)
        • line 56: for (i in seq_len(commonStrandsNumber)) {
        • line 57: currentStrand <- commonStrands[i]
        • line 58: strandIndsPeak <- which((mapPeakStrandInds ==
        • line 59: currentStrand) == TRUE)
        • line 60: currentPeakInds <- mapPeakInds[strandIndsPeak]
        • line 61: currentPeak <- peakTable$peakName[currentPeakInds]
        • line 62: strandIndsGene <- which((mapGeneStrandInds ==
        • line 63: currentStrand) == TRUE)
        • line 64: currentGeneInds <- mapGeneInds[strandIndsGene]
        • line 65: currentGene <- geneTable$geneNames[currentGeneInds]
      • in extendSitesInGivenRegions
        • line 1: function (givenRegions, sites, distance = 1e+05)
        • line 2: {
        • line 3: extendRegions <- GenomicRanges::GRanges(seqnames = S4Vectors::Rle(GenomeInfoDb::seqnames(sites)),
        • line 4: ranges = IRanges::IRanges(BiocGenerics::start(sites),
        • line 5: end = BiocGenerics::end(sites)) + distance)
        • line 6: siteRegionOverlap <- GenomicRanges::findOverlaps(sites,
        • line 7: givenRegions)
      • in genewiseAssociation
        • line 32: if (associationBy == "distance") {
        • line 33: extendRegions <- GenomicRanges::GRanges(seqnames = S4Vectors::Rle(GenomeInfoDb::seqnames(peakCoordinates)),
        • line 34: ranges = IRanges::IRanges(BiocGenerics::start(peakCoordinates),
        • line 35: end = BiocGenerics::end(peakCoordinates)) + distance)
        • line 36: }
        • line 37: else if (associationBy == "regions") {
        • line 38: givenRegionNumber <- length(givenRegions)
        • line 39: if (givenRegionNumber == 0) {
        • line 41: }
        • line 42: extendRegions <- extendSitesInGivenRegions(sites = peakCoordinates,
        • line 43: distance = distance, givenRegions = givenRegions)
        • line 44: }
        • line 46: extendRegions <- GenomicRanges::GRanges(seqnames = S4Vectors::Rle(GenomeInfoDb::seqnames(peakCoordinates)),
        • line 47: ranges = IRanges::IRanges(BiocGenerics::start(peakCoordinates),
        • line 48: end = BiocGenerics::end(peakCoordinates)) + distance)
        • line 49: }
        • line 50: else {
        • line 51: stop("Peak to gene is associated either by distance or regions")
        • line 52: }
        • line 53: map <- GenomicRanges::findOverlaps(geneCoordinates, extendRegions)
        • line 78: }
        • line 79: strand1Center <- getCenterOfPeaks(strand1)
        • line 80: center1 <- BiocGenerics::start(strand1Center)
        • line 81: rm(strand1Center)
        • line 82: gc()
        • line 83: strand2Center <- getCenterOfPeaks(strand2)
        • line 84: center2 <- BiocGenerics::start(strand2Center)
        • line 85: rm(strand2Center)
        • line 86: gc()
        • line 87: D <- abs(center1 - center2)
        • line 88: distantInteractomInds <- which((D > (distance - 1)) ==
        • line 89: TRUE)
        • line 90: strand1 <- strand1[distantInteractomInds]
      • in getTargetGenesNumber
        • line 1: sites = NA, distance = 50000)
        • line 2:{
        • line 3: geneNumber <- length(geneCoordinates)
        • line 4: if (geneNumber < 2) {
        • line 5: stop("At least two genes corrdinats must be provided")
        • line 6: }
        • line 7: siteNumber <- length(sites)
        • line 8: if (siteNumber < 2) {
        • line 9: stop("At least two sites must be provided")
        • line 10: }
        • line 11: extendRegions <- GenomicRanges::GRanges(seqnames = S4Vectors::Rle(GenomeInfoDb::seqnames(sites)),
        • line 12: ranges = IRanges::IRanges(BiocGenerics::start(sites),
        • line 13: end = BiocGenerics::end(sites)) + distance)
        • line 14: targets <- S4Vectors::queryHits(GenomicRanges::findOverlaps(geneCoordinates,
        • line 15: extendRegions))
      • in getTargetGenesPvals
        • line 2: distance = 50000, givenRegions = NA)
        • line 3:{
        • line 4: geneNumber <- length(geneCoordinates)
        • line 5: if (geneNumber < 2) {
        • line 6: stop("At least two genes corrdinats must be provided")
        • line 7: }
        • line 8: siteNumber <- length(sites)
        • line 9: if (siteNumber < 2) {
        • line 10: stop("At least two sites must be provided")
        • line 11: }
        • line 12: sites <- getCenterOfPeaks(sites)
        • line 13: if (associationBy == "distance") {
        • line 16: }
        • line 17: else if (associationBy == "regions") {
        • line 18: givenRegionNumber <- length(givenRegions)
        • line 19: if (givenRegionNumber < 2) {
        • line 20: if (is.na(givenRegions)) {
        • line 21: stop("For extending sites in regions, the regions must be provided")
        • line 22: }
        • line 23: }
        • line 24: extendRegions <- extendSitesInGivenRegions(sites = sites,
        • line 25: distance = distance, givenRegions = givenRegions)
        • line 28: }
        • line 29: else {
        • line 30: stop("Peak to gene is associated either by distance or regions")
        • line 31: }
        • line 32: eps <- 1
        • line 33: log2ScaleCount <- log2(targetNumber + eps)
        • line 34: upperbound <- 2^(ceiling(stats::quantile(log2ScaleCount,
        • line 35: 0.75) + 3 * stats::IQR(log2ScaleCount)))
        • line 36: if (upperbound < 4) {
        • line 37: warning("Insufficeint interactions to model")
        • line 38: acceptedInds <- c(1:geneNumber)
        • line 39: }
        • line 40: else {
        • line 41: acceptedInds <- which((targetNumber < upperbound) ==
        • line 42: TRUE)
        • line 43: }
        • line 44: if (dist == "negative binomial") {
        • line 45: flag <- TRUE
        • line 46: try({
        • line 47: distNB <- MASS::fitdistr(targetNumber[acceptedInds],
        • line 48: densfun = "negative binomial")
        • line 49: pvals <- stats::pnbinom(targetNumber, size = (distNB$estimate)[1],
        • line 50: mu = (distNB$estimate)[2], lower.tail = FALSE)
        • line 51: flag <- FALSE
        • line 52: })
        • line 53: if (flag) {
        • line 54: stop("negative binomial distribution could not be fitted try poisson")
        • line 55: }
        • line 56: }
        • line 57: else if (dist == "poisson") {
        • line 58: distP <- MASS::fitdistr(targetNumber[acceptedInds], densfun = "poisson")
        • line 59: pvals <- stats::ppois(targetNumber, lambda = as.numeric(distP[1]),
        • line 60: lower.tail = FALSE)
        • line 61: }
        • line 62: else {
        • line 63: stop("The distribution should be either negative binomial or poisson")
        • line 64: }
        • line 65: return(pvals)
      • in getTargetGenesPvalsWithDNAInteractions
        • line 3: {
        • line 4: geneNumber <- length(geneCoordinates)
        • line 5: if (geneNumber < 2) {
        • line 6: stop("At least two genes corrdinats must be provided")
        • line 7: }
        • line 8: siteNumber <- length(sites)
        • line 9: if (siteNumber < 2) {
        • line 10: stop("At least two sites must be provided")
        • line 11: }
        • line 12: LenStrand1 <- length(strand1)
        • line 13: LenStrand2 <- length(strand2)
        • line 14: if (LenStrand1 < 2) {
        • line 15: stop("At least two DNA-DNA interactions must be provided")
        • line 16: }
        • line 17: if (LenStrand1 != LenStrand2) {
        • line 18: stop("The length of Gstrand and Sstrand must be equal")
        • line 19: }
        • line 20: sites <- getCenterOfPeaks(sites)
        • line 21: targetNumber <- rep(0, geneNumber)
        • line 22: if (distance > -1) {
        • line 23: strand1Center <- getCenterOfPeaks(strand1)
        • line 24: center1 <- BiocGenerics::start(strand1Center)
        • line 25: rm(strand1Center)
        • line 26: gc()
        • line 27: strand2Center <- getCenterOfPeaks(strand2)
        • line 28: center2 <- BiocGenerics::start(strand2Center)
        • line 29: rm(strand2Center)
        • line 30: gc()
        • line 31: D <- abs(center1 - center2)
        • line 32: distantInteractomInds <- which((D > (distance - 1)) ==
        • line 33: TRUE)
        • line 34: InteractionNumber <- length(distantInteractomInds)
        • line 35: if (InteractionNumber > 0) {
        • line 36: strand1 <- strand1[distantInteractomInds]
        • line 37: strand2 <- strand2[distantInteractomInds]
        • line 38: }
        • line 41: }
        • line 42: geneCoordinates <- getCenterOfPeaks(geneCoordinates)
        • line 43: InteractionNumber <- LenStrand1
        • line 44: mapSite <- GenomicRanges::findOverlaps(sites, strand1)
        • line 45: mapSiteInds <- S4Vectors::queryHits(mapSite)
        • line 46: mapSiteStrandInds <- S4Vectors::subjectHits(mapSite)
        • line 47: mapGene <- GenomicRanges::findOverlaps(geneCoordinates,
        • line 48: strand2)
        • line 49: mapGeneInds <- S4Vectors::queryHits(mapGene)
        • line 50: mapGeneStrandInds <- S4Vectors::subjectHits(mapGene)
        • line 51: commonStrands <- intersect(mapSiteStrandInds, mapGeneStrandInds)
        • line 52: targetNumberDNAIntacts <- rep(0, geneNumber)
        • line 53: for (i in seq_len(geneNumber)) {
        • line 54: tmpGeneInds <- which((mapGeneInds == i) == TRUE)
        • line 55: tmpGeneIndsNum <- length(tmpGeneInds)
        • line 56: if (tmpGeneIndsNum > 0) {
        • line 57: tmpStrandInds <- mapGeneStrandInds[tmpGeneInds]
        • line 58: intersect(tmpStrandInds, commonStrands)
        • line 59: tmpSiteInds <- match(tmpStrandInds, mapSiteStrandInds)
        • line 60: tmpSiteInds <- tmpSiteInds[!is.na(tmpSiteInds)]
        • line 63: }
        • line 64: }
        • line 65: targetNumber <- targetNumber + targetNumberDNAIntacts
        • line 66: eps <- 1
        • line 67: log2ScaleCount <- log2(targetNumber + eps)
        • line 84: mu = (distNB$estimate)[2], lower.tail = FALSE)
        • line 85: flag <- FALSE
        • line 86: })
        • line 87: if (flag) {
      • in getTargetGenesPvalsWithIntensities
        • line 2: sites = NA, distance = 50000, givenRegions = NA)
        • line 3: {
        • line 4: geneNumber <- length(geneCoordinates)
        • line 5: if (geneNumber < 10) {
        • line 6: stop("At least ten genes corrdinats must be provided")
        • line 7: }
        • line 8: siteNumber <- length(sites)
        • line 9: if (siteNumber < 10) {
        • line 10: stop("At least ten sites must be provided")
        • line 11: }
        • line 12: sites <- getCenterOfPeaks(sites)
        • line 13: if (associationBy == "distance") {
        • line 14: extendRegions <- GenomicRanges::GRanges(seqnames = S4Vectors::Rle(GenomeInfoDb::seqnames(sites)),
        • line 15: ranges = IRanges::IRanges(BiocGenerics::start(sites),
        • line 16: end = BiocGenerics::end(sites)) + distance)
        • line 17: }
        • line 18: else if (associationBy == "regions") {
        • line 19: givenRegionNumber <- length(givenRegions)
        • line 20: if (givenRegionNumber < 2) {
        • line 21: if (is.na(givenRegions)) {
        • line 22: stop("For extending sites in regions, the regions must be provided")
        • line 23: }
        • line 24: }
        • line 25: extendRegions <- extendSitesInGivenRegions(sites = sites,
        • line 26: distance = distance, givenRegions = givenRegions)
        • line 27: }
        • line 28: else {
        • line 29: stop("Peak to gene is associated either by distance or regions")
        • line 30: }
        • line 31: overlaps <- GenomicRanges::findOverlaps(geneCoordinates,
        • line 32: extendRegions)
        • line 34: overlapsNumber <- length(overlaps)
        • line 35: if (overlapsNumber > 10) {
        • line 36: for (i in seq_len(overlapsNumber)) {
        • line 37: targetNumber[S4Vectors::queryHits(overlaps[i])] <- targetNumber[S4Vectors::queryHits(overlaps[i])] +
        • line 38: intensities[S4Vectors::subjectHits(overlaps[i])]
        • line 39: }
        • line 40: }
        • line 41: else {
        • line 42: stop("Genes and sites are far from each other")
        • line 43: }
        • line 44: eps <- 1
        • line 45: log2ScaleCount <- log2(targetNumber + eps)
        • line 46: nonZeroInds <- which((log2ScaleCount > 0) == TRUE)
        • line 47: lowerbound <- ceiling(stats::quantile(log2ScaleCount[nonZeroInds],
        • line 48: 0.25) - 1.5 * stats::IQR(log2ScaleCount[nonZeroInds]))
        • line 49: upperbound <- ceiling(stats::quantile(log2ScaleCount[nonZeroInds],
        • line 50: 0.75) + 1.5 * stats::IQR(log2ScaleCount[nonZeroInds]))
        • line 51: acceptedInds <- intersect(which((log2ScaleCount < upperbound) ==
        • line 52: TRUE), which((log2ScaleCount > lowerbound) == TRUE))
        • line 53: flag <- TRUE
        • line 54: try({
        • line 55: distN <- MASS::fitdistr(log2ScaleCount[acceptedInds],
        • line 56: densfun = "normal")
        • line 57: pvals <- stats::pnorm(log2ScaleCount, mean = (distN$estimate)[1],
        • line 58: sd = (distN$estimate)[2], lower.tail = FALSE)
        • line 59: flag <- FALSE
        • line 60: })
        • line 61: if (flag) {
        • line 62: warning("Low number of sites and genes")
        • line 63: try({
        • line 64: distN <- MASS::fitdistr(log2ScaleCount, densfun = "normal")
        • line 65: pvals <- stats::pnorm(log2ScaleCount, mean = (distN$estimate)[1],
        • line 66: sd = (distN$estimate)[2], lower.tail = FALSE)
        • line 67: flag <- FALSE
        • line 68: })
        • line 69: }
        • line 70: if (flag) {
        • line 71: stop("Cannot fit the log-normal distirbution. Use more sites and genes")
        • line 72: }
        • line 73: return(pvals)
      • in genewiseAssociation
        • line 98: mapGeneInds <- S4Vectors::queryHits(mapGene)
        • line 99: mapGeneStrandInds <- S4Vectors::subjectHits(mapGene)
        • line 100: commonStrands <- intersect(mapPeakStrandInds, mapGeneStrandInds)
        • line 101: commonStrandsNumber <- length(commonStrands)
        • line 102: geneNamesDistal <- NULL
        • line 104: distanceDistal <- NULL
        • line 105: for (i in seq_len(commonStrandsNumber)) {
        • line 106: currentStrand <- commonStrands[i]
        • line 107: strandIndsPeak <- which((mapPeakStrandInds == currentStrand) ==
        • line 108: TRUE)
        • line 109: currentPeakInds <- mapPeakInds[strandIndsPeak]
        • line 110: currentPeak <- peakNames[currentPeakInds]
        • line 112: currentPeakNumber <- length(currentPeak)
        • line 113: strandIndsGene <- which((mapGeneStrandInds == currentStrand) ==
        • line 114: TRUE)
        • line 115: currentGeneInds <- mapGeneInds[strandIndsGene]
        • line 116: currentGene <- geneNames[currentGeneInds]
      • in getTargetGenesPvalsWithIntensitiesAndDNAInteractions
        • line 3: {
        • line 4: geneNumber <- length(geneCoordinates)
        • line 5: if (geneNumber < 10) {
        • line 6: stop("At least ten genes corrdinats must be provided")
        • line 7: }
        • line 8: siteNumber <- length(sites)
        • line 9: if (siteNumber < 10) {
        • line 10: stop("At least ten sites must be provided")
        • line 11: }
        • line 12: LenStrand1 <- length(strand1)
        • line 13: LenStrand2 <- length(strand2)
        • line 14: if (LenStrand1 < 2) {
        • line 15: stop("At least two DNA-DNA interactions must be provided")
        • line 16: }
        • line 17: if (LenStrand1 != LenStrand2) {
        • line 18: stop("The length of Gstrand and Sstrand must be equal")
        • line 19: }
        • line 20: sites <- getCenterOfPeaks(sites)
        • line 21: targetNumber <- rep(0, geneNumber)
        • line 22: if (distance > -1) {
        • line 23: strand1Center <- getCenterOfPeaks(strand1)
        • line 24: center1 <- BiocGenerics::start(strand1Center)
        • line 25: rm(strand1Center)
        • line 26: gc()
        • line 27: strand2Center <- getCenterOfPeaks(strand2)
        • line 28: center2 <- BiocGenerics::start(strand2Center)
        • line 29: rm(strand2Center)
        • line 30: gc()
        • line 31: D <- abs(center1 - center2)
        • line 32: distantInteractomInds <- which((D > (distance - 1)) ==
        • line 33: TRUE)
        • line 34: InteractionNumber <- length(distantInteractomInds)
        • line 35: if (InteractionNumber > 0) {
        • line 36: strand1 <- strand1[distantInteractomInds]
        • line 37: strand2 <- strand2[distantInteractomInds]
        • line 38: }
        • line 39: extendRegions <- GenomicRanges::GRanges(seqnames = S4Vectors::Rle(GenomeInfoDb::seqnames(sites)),
        • line 40: ranges = IRanges::IRanges(BiocGenerics::start(sites),
        • line 41: end = BiocGenerics::end(sites)) + distance)
        • line 44: overlapsNumber <- length(overlaps)
        • line 45: if (overlapsNumber > 10) {
        • line 46: for (i in seq_len(overlapsNumber)) {
        • line 47: targetNumber[S4Vectors::queryHits(overlaps[i])] <- targetNumber[S4Vectors::queryHits(overlaps[i])] +
        • line 48: intensities[S4Vectors::subjectHits(overlaps[i])]
        • line 49: }
        • line 50: }
        • line 51: else {
        • line 52: stop("Genes and sites are far from each other")
        • line 53: }
        • line 54: }
        • line 55: geneCoordinates <- getCenterOfPeaks(geneCoordinates)
        • line 56: InteractionNumber <- length(strand1)
        • line 57: mapSite <- GenomicRanges::findOverlaps(sites, strand1)
        • line 58: mapSiteInds <- S4Vectors::queryHits(mapSite)
        • line 59: mapSiteStrandInds <- S4Vectors::subjectHits(mapSite)
        • line 60: mapGene <- GenomicRanges::findOverlaps(geneCoordinates,
        • line 61: strand2)
        • line 62: mapGeneInds <- S4Vectors::queryHits(mapGene)
        • line 63: mapGeneStrandInds <- S4Vectors::subjectHits(mapGene)
        • line 64: commonStrands <- intersect(mapSiteStrandInds, mapGeneStrandInds)
        • line 65: targetNumberDNAIntacts <- rep(0, geneNumber)
        • line 66: for (i in seq_len(geneNumber)) {
        • line 67: tmpGeneInds <- which((mapGeneInds == i) == TRUE)
        • line 68: tmpGeneIndsNumber <- length(tmpGeneInds)
        • line 69: if (tmpGeneIndsNumber > 0) {
        • line 70: tmpStrandInds <- mapGeneStrandInds[tmpGeneInds]
        • line 71: intersect(tmpStrandInds, commonStrands)
        • line 72: tmpSiteInds <- match(tmpStrandInds, mapSiteStrandInds)
        • line 73: tmpSiteInds <- tmpSiteInds[!is.na(tmpSiteInds)]
        • line 78: }
        • line 79: }
        • line 80: }
        • line 81: targetNumber <- targetNumber + targetNumberDNAIntacts
        • line 82: eps <- 1
        • line 83: log2ScaleCount <- log2(targetNumber + eps)
        • line 84: nonZeroInds <- which((log2ScaleCount > 0) == TRUE)
        • line 85: lowerbound <- ceiling(stats::quantile(log2ScaleCount[nonZeroInds],
        • line 86: 0.25) - 1.5 * stats::IQR(log2ScaleCount[nonZeroInds]))
        • line 87: upperbound <- ceiling(stats::quantile(log2ScaleCount[nonZeroInds],
        • line 88: 0.75) + 1.5 * stats::IQR(log2ScaleCount[nonZeroInds]))
        • line 89: acceptedInds <- intersect(which((log2ScaleCount < upperbound) ==
        • line 90: TRUE), which((log2ScaleCount > lowerbound) == TRUE))
        • line 91: flag <- TRUE
        • line 92: try({
        • line 93: distN <- MASS::fitdistr(log2ScaleCount[acceptedInds],
        • line 94: densfun = "normal")
        • line 95: pvals <- stats::pnorm(log2ScaleCount, mean = (distN$estimate)[1],
        • line 96: sd = (distN$estimate)[2], lower.tail = FALSE)
        • line 97: flag <- FALSE
        • line 98: })
        • line 99: if (flag) {
        • line 100: warning("Low number of sites and genes")
        • line 101: try({
        • line 102: distN <- MASS::fitdistr(log2ScaleCount, densfun = "normal")
        • line 103: pvals <- stats::pnorm(log2ScaleCount, mean = (distN$estimate)[1],
        • line 104: sd = (distN$estimate)[2], lower.tail = FALSE)
        • line 105: flag <- FALSE
        • line 106: })
        • line 107: }
        • line 108: if (flag) {
        • line 109: stop("Cannot fit the log-normal distirbution. Use more sites and genes")
        • line 110: }
        • line 111: return(pvals)

Documentation

  • Important: Vignette should have an Installation section.
    • rmd file vignettes/Site2Target.Rmd
  • Important: Please include Bioconductor installation instructions using BiocManager.
    • rmd file vignettes/Site2Target.Rmd
  • Note: Vignette includes motivation for submitting to Bioconductor as part of the abstract/intro of the main vignette.
    • rmd file vignettes/Site2Target.Rmd
  • Note: typos:
WORD FOUND IN
cooridnates granges2String.Rd:5
string2Granges.Rd:5
granes granges2String.Rd:5
grangess string2Granges.Rd:16
nromal description:1
trings string2Granges.Rd:16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place OK
Projects
None yet
Development

No branches or pull requests

4 participants