Skip to content

Commit

Permalink
Merge pull request #80 from databio/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
vpnagraj authored Apr 24, 2018
2 parents fb91669 + c164c4a commit 1be70d5
Show file tree
Hide file tree
Showing 16 changed files with 2,677 additions and 233 deletions.
9 changes: 6 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ inst/doc
apps/LOLAweb/cache/*
*.bed
apps/LOLAweb/lola_vignette_data/*.bed
apps/LOLAweb/universes/*
apps/LOLAweb/reference/
apps/LOLAwebuserSets/*
apps/LOLAweb/universes
apps/LOLAweb/reference
apps/LOLAweb/cache
apps/LOLAweb/shinylog
apps/LOLAweb/results
apps/LOLAweb/userSets
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ services:
- docker

before_install:
- sudo pip install --upgrade pip
- pip install --user awscli
- export PATH=$PATH:$HOME/.local/bin
- echo $DOCKER_PASSWORD | docker login -u "$DOCKER_USERNAME" --password-stdin
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ This repository contains two components: 1) the shiny [app source code](apps/LOL

LOLAweb is implemented as an interactive Shiny app. You can run this app locally by following the [instructions in the appfolder](apps/LOLAweb/).

## Dockerfile
## Docker

We have also produced a Dockerfile and host an image on dockerhub (`databio/lolaweb`). You can read how to run LOLAweb locally in a docker container by reading the [instructions in the docker folder](docker/).
We have also produced a Dockerfile and published an image on Docker Hub (`databio/lolaweb`). Learn how to run LOLAweb locally in a docker container by reading the [instructions in the docker folder](docker/).
63 changes: 62 additions & 1 deletion apps/LOLAweb/about.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
<ul>
<li><a href="#what">What is LOLAweb?</a></li>
<li><a href="#how">How do I use LOLAweb?</a></li>
<li><a href="#uni">What universe should I choose?</a></li>
<li><a href="#database">What database should I choose?</a></li>
<li><a href="#share">Can I share my LOLAweb results with others?</a></li>
<li><a href="#results">How long are LOLAweb results kept?</a></li>
</ul>
Expand All @@ -26,10 +28,69 @@ <h3>How do I use LOLAweb?</h3>

<p>Hit "Run LOLA" and watch the magic happen! The results will be displayed as an interactive interface letting you sort and filter the top hits. The schematic below demonstrates the whole process:</p>

<img alt="LOLAweb worfklow" title="LOLAweb workflow" src="LOLAweb-workflow.png" style="height:auto; width:50%; min-width:50%;">
<img alt="LOLAweb workflow" title="LOLAweb workflow" src="LOLAweb-workflow.png" style="height:auto; width:50%; min-width:50%;">

<hr>

<a name="uni"> </a>
<h3>What universe should I choose?</h3>

<p>One of the key questions when you run LOLA is what your background set, or "universe" is. The universe is a bit open-ended, and it's reasonable to try a few different things. Changing the universe isn't right or wrong, it just changes the question you are asking. The universe set is tested for overlap with the database, and the counts are used in the contingency tables that determine significance for your user sets. The reason this is important is that if you have some regions which were never really possible to end up in a region set of interest, it's unfair to penalize your regions for not overlapping those regions in the database, changing the results of the significance test.</p>

<img src="universe_selection.svg" style="padding:35px; float:right; max-width:50%;">

<p>Let's imagine you've done an experiment where you're testing how some epigenomic signal (say, H3K27ac or DNA methylation) responds to a perturbation. You end up with a set of genomic regions that were covered by your assay, along with two subsets of interest: those that increased, and those that decreased in signal (see figure).</p>

<p>In this case, one approach is to think of this as two user query sets: one for each subset of regions that change. Now, what should your universe be? Well, you have a few options. Let's go through 4 common universe choices see what they mean about the significance test you're performing. The four choices move in increasing order of specificity:

<p><b>Choice 1: genome tiles</b>. LOLAweb lets you choose tiling regions of various sizes. These tiles are little better than generic splits of the genome because they at least account for reference assembly gaps. But otherwise, this is a pretty basic background set which doesn't restrict the question at all. This would be a useful universe to choose if you are not sure what to expect, and your experiment could have included any genomic region as output. The result will tell you what database region sets are enriched in your query sets generally, across the whole genome.</p>

<p><b>Choice 2: any regulatory elements</b>. A bit more restricted universe set is the set of active regulatory elements. LOLAweb includes a manually curated universe set that was derived by merging all of the DNAse hypersensitivity data from over 100 cell types from the ENCODE project. This set will restrict your significance test to only active regulatory elements, so it's a reasonable background set if you're looking at ChIP-seq data or some other active signal, but may not be relevant for every experiment type. In practice, we find this to be a useful first pass test for many experiments. Relative to the tiling approach, this, for example, won't penalize your overlap test for not overlapping a database region that's hidden away in a heterochromatic region. It yields which database region sets are enriched in your query set, but only considering active parts of the genome. An advantage of this approach is that it's easy: we've already defined that set for you and you can just select it. A disadvantage is that you're restricted to elements that were covered in the considered cell types (although, offsetting that is the point that really specialized stuff is less likely to be in the database, anyway).</p>

<p><b>Choice 3: all regions covered</b>. Probably a better choice (though requiring a bit more work) is to derive the universe from your actual data. The natural choice would be to use the set of regions your experiment actually covered. You can think of this universe as the set of regions you tested for possible inclusion in your user sets; or, in other words, it is the restricted background set of regions that were tested, including everything in your regions of interest as well as those that were tested but did not get included. If you can easily come up with such a test for your problem, this is a really good choice for background universe. The resulting test will give you overlaps with database regions without penalizing for areas that weren't even covered in your experiment anyway. This asks: what's special about my regions that increased, versus all the regions that didn't?</p>

<p><b>Choice 4: changing regions only</b> Finally, you could make a very restrictive universe by just combining your two region sets of interest. In this case, it would be the set of regions that either increased or decreased. The result of this test will give you the enrichment of the increased regions <i>relative to changing regions</i>. This is a subtly different question to ask than using the set of all covered regions (choice 3). The LOLA R package provides an easy way to do this with the <i>buildRestrictedUniverse</i> function, and you can accomplish the same thing with the radio button on LOLAweb. It assumes that we're only interested in regions that change, then we ask the question: what's special about the ones that increased?</p>

<p>In the case of DNA methylation: all regions that had reasonable coverage of methylation reads are your universe, and those that were either highly methylated or lowly methylated (or differentially methylated) would be your subsets of interest. It's the set including all genes that had enough reads that they could have been differentially methylated, even if they weren't.</p>

<hr>

<a name="database"> </a>
<h3>What database should I choose?</h3>

LOLAweb provides access to a few curated databases we have put together for you. If you want to run LOLAweb on a custom database, that's also possible, but you'd have to download the app and run it locally. We provide docker containers that make this easy. Alternatively, the LOLA R package can work with custom databases and may be simpler, if you don't want to deal with a containerized setup or running your own local shiny server.

But because databases are too large , with LOLAwebWhen you select a reference

Current contents of LOLA core:

* hg19/hg38
1. Transcription Factor binding sites from [ENCODE](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/)
2. Tissue clustered DNase hypersensitive sites from [Sheffield et al. (2013)](http://dnase.genome.duke.edu)
3. [Codex database](http://codex.stemcells.cam.ac.uk/)
4. A collection of UCSC feature tables (like CpG islands)
5. Cistrome database from [Cistrome](http://dx.doi.org/10.1186/gb-2011-12-8-r83)
6. Epigenome databases from [Cistrome](http://dx.doi.org/10.1186/gb-2011-12-8-r83)
* mm10
1. [Codex database](http://codex.stemcells.cam.ac.uk/)
2. Cistrome database
3. Encode TFBS


Current contents of LOLA Extended:

* hg19/hg38
1. Roadmap epigenomics regions
2. JASPAR motif matches


<hr>
<a name="customdb"> </a>
<h3>Can I use a custom database with LOLAweb?</h3>
<p>Yes, you can, but you'll have to run LOLAweb locally. Luckily, we make this easy. LOLAweb provides access to a few curated databases we have put together for you. If you want to run LOLAweb on a custom database, that's also possible, but you'd have to download the app and run it locally. We provide docker containers that make this easy. Alternatively, the LOLA R package can work with custom databases and may be simpler, if you don't want to deal with a containerized setup or running your own local shiny server.</p>

<hr>

<a name="share"> </a>
<h3>Can I share my LOLAweb results with others?</h3>
<p>Yes! Once you run your region sets through LOLAweb, you are redirected to a web page unique to your data. It will look something like this:</p>
Expand Down
Loading

0 comments on commit 1be70d5

Please sign in to comment.