Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/owenjm/damid_pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
owenjm committed Jun 1, 2015
2 parents 0a4bc33 + e1c4a2b commit b6367c7
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Processing DamID-seq data involves extending single-end reads, aligning the reads to the genome and determining the coverage, similar to processing regular ChIP-seq datasets. However, as DamID data is represented as a log2 ratio of (Dam-fusion/Dam), normalisation of the sample and Dam-only control is necessary and adding pseudocounts to mitigate the effect of background counts is highly recommended.

We use a single pipeline script to handle sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below).
[damidseq_pipeline](https://github.com/owenjm/damidseq_pipeline/tarball/master) is a single script that automatically handles sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below).

### Download

Download the latest version of the pipeline script and associated files:
* [As a zipfile](https://github.com/owenjm/damidseq_pipeline/zipball/master)
* [As a tarball](https://github.com/owenjm/damidseq_pipeline/tarball/master)
* [As a zipfile](https://github.com/owenjm/damidseq_pipeline/zipball/master)

Prebuilt GATC fragment files used by the script are available for the following genomes:
* [*Drosophila melanogaster* r5.57](https://github.com/owenjm/damidseq_pipeline/raw/gh-pages/pipeline_gatc_files/Dmel_r5.57.GATC.gff.gz)
Expand All @@ -26,16 +26,16 @@ Prebuilt GATC fragment files used by the script are available for the following

### Installation

1. Unzip the pipeline script zip file, make the damid_pipeline.pl file executable and place it in your path
1. Extract the pipeline script archive, make the damid_pipeline file executable and place it in your path
1. Install [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
1. Obtain Bowtie 2 indices provided by [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) or [Illumina's iGenome](http://support.illumina.com/sequencing/sequencing_software/igenome.html)

Alternatively, build the Bowtie 2 index files manually:
1. Download the latest FASTA genome primary_assembly (or toplevel) file from [Ensembl](ftp.ensembl.org/pub/current_fasta/)
1. Download the latest FASTA genome primary_assembly (or toplevel) file from [Ensembl](http://ftp.ensembl.org/pub/current_fasta/)
e.g. [the current release for *Mus musculus*](http://ftp.ensembl.org/pub/current_fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz)

(alternatively, for *Drosophila*, download from the [Flybase FTP site](http://ftp.flybase.net/releases/current/)
e.g. [*D. melanogaster* release 5.57](http://ftp.flybase.net/releases/FB2014_03/dmel_r5.57/fasta/dmel-all-chromosome-r5.57.fasta.gz))
(alternatively, for *Drosophila*, download from the Flybase FTP site (ftp://ftp.flybase.net/releases/current/)
e.g. ftp://ftp.flybase.net/releases/FB2014_03/dmel_r5.57/fasta/dmel-all-chromosome-r5.57.fasta.gz )
1. Extract the .gz file
1. Run bowtie2-build in the directory containing the extracted .fasta file. For the examples above:

Expand Down Expand Up @@ -69,7 +69,7 @@ In order to run correctly, the script needs to know the locations of two paths,

In order to setup the pipeline to process the *D. melanogaster* genome, for example, the first-run command would be:

damidseq_pipeline.pl --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57
damidseq_pipeline --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57

If these paths do not already exist and the script is run with these options and correct values, the paths will be saved for all future runs unless overridden on the command-line.

Expand All @@ -83,13 +83,13 @@ The script will by default determine sample names from the file names, and expec

To see all available options, run the script with --help command-line option:

damidseq_pipeline.pl --help
damidseq_pipeline --help

This will give you a list of adjustable parameters and their default and current values if applicable. We recommend keeping these at the default value in most cases; however, these can be modified on the command-line with --option=value (no spaces).

To save modified values for all future runs, run the script with the parameter you wish to change together with the --save_defaults command-line option:

damidseq_pipeline.pl --save_defaults
damidseq_pipeline --save_defaults

If bowtie2 and samtools are not in your path, you can specify these on the command-line also.

Expand All @@ -105,12 +105,12 @@ Either file can be converted to .tdf format for viewing in [IGV](http://www.broa

If the user expects to process data from multiple genomes, separate genome specifications can be saved by using the --save_defaults=[name] along with the --bowtie2_genome_dir and --gatc_frag_file options (and any other custom options that the user wishes to set as default for this genome, e.g. the bin width). For e.g.:

damidseq_pipeline.pl --save_defaults=fly --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57
damidseq_pipeline.pl --save_defaults=mouse --bins=500 --gatc_frag_file=path/to/MmGRCm38.GATC.gff.gz --bowtie2_genome_dir=path/to/Mm_GRCm38/GRCm38
damidseq_pipeline --save_defaults=fly --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57
damidseq_pipeline --save_defaults=mouse --bins=500 --gatc_frag_file=path/to/MmGRCm38.GATC.gff.gz --bowtie2_genome_dir=path/to/Mm_GRCm38/GRCm38

Once set up, different genome definitions can be quickly loaded using the --load_defaults=[name] option, e.g.:

damidseq_pipeline.pl --load_defaults=fly
damidseq_pipeline --load_defaults=fly

All currently saved genome definitions can be listed using --load_defaults=list.

Expand Down

0 comments on commit b6367c7

Please sign in to comment.