Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrderCpGsByLocation with ignore.strand = TRUE #9

Open
Ning-L opened this issue Dec 15, 2021 · 2 comments
Open

OrderCpGsByLocation with ignore.strand = TRUE #9

Ning-L opened this issue Dec 15, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@Ning-L
Copy link
Contributor

Ning-L commented Dec 15, 2021

Hi,

I realized that some start positions are bigger than the end positions in the results from lmmTestAllRegions. For example, one of my results:

   chrom start   end nCpGs    Estimate    StdErr       Stat    pValue FDR
1: chr18 10922 10862     4 -0.01854889 0.0220416 -0.8415399 0.4000455   1     

Following the extract of EPIC.hg19.manifest from sesameData for the 4 CpGs in region:

GRanges object with 4 ranges and 1 metadata column:
             seqnames      ranges strand | address_A
                <Rle>   <IRanges>  <Rle> | <integer>
  cg23708725    chr18 10922-10923      + |  56710277
  cg23947066    chr18 10930-10931      + |  80737195
  cg07138201    chr18 10935-10936      + |  36755955
  cg00703566    chr18 10862-10863      - |  40648584
  -------
  seqinfo: 26 sequences from an unspecified genome; no seqlengths

After looking at the source code, it seems that you use the default settings of sort() which order a GRanges object first by seqnames, then by strand, then by start, and finally by width.

CpGs.gr <- sort(
CpGlocations.gr[ CpGs_char[goodCpGs_lgl] ]
)

As the identified co-methylated CpGs are not always in the same strand, when you try to get the region name, using the default sort() will lead to some regions having start greater than end.

coMethDMR/R/lmmTest.R

Lines 99 to 103 in 34553b9

regionName <- NameRegion(
OrderCpGsByLocation(
betaOne_df$ProbeID, genome, arrayType, output = "dataframe"
)
)

Will you consider to use sort(..., ignore.strand = TRUE) to eliminate this problem?

@gabrielodom
Copy link
Member

Thanks to @Ning-L, we are closing this with pull request #10

@gabrielodom
Copy link
Member

I need to add this ignoreStrand argument to any function in coMethDMR:: that calls OrderCpGsByLocation() directly or indirectly. So far, we have only fixed: GetCpGsInRegion(), lmmTest(), lmmTestAllRegions(), OrderCpGsByLocation(), and WriteCloseByAllRegions().

I will wait until I finish all the work for https://github.com/TransBioInfoLab/coMethDMR/tree/update_sesame before I fix this.

@gabrielodom gabrielodom reopened this Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants