From 1b4f698b451e4a76e791de1624b02d2f07a87b9a Mon Sep 17 00:00:00 2001 From: Owen Marshall Date: Thu, 28 May 2015 17:29:49 +0100 Subject: [PATCH 1/6] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index c69904c..bfeb109 100644 --- a/README.md +++ b/README.md @@ -31,11 +31,11 @@ Prebuilt GATC fragment files used by the script are available for the following 1. Obtain Bowtie 2 indices provided by [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) or [Illumina's iGenome](http://support.illumina.com/sequencing/sequencing_software/igenome.html) Alternatively, build the Bowtie 2 index files manually: - 1. Download the latest FASTA genome primary_assembly (or toplevel) file from [Ensembl](ftp.ensembl.org/pub/current_fasta/) + 1. Download the latest FASTA genome primary_assembly (or toplevel) file from [Ensembl](http://ftp.ensembl.org/pub/current_fasta/) e.g. [the current release for *Mus musculus*](http://ftp.ensembl.org/pub/current_fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz) - (alternatively, for *Drosophila*, download from the [Flybase FTP site](http://ftp.flybase.net/releases/current/) - e.g. [*D. melanogaster* release 5.57](http://ftp.flybase.net/releases/FB2014_03/dmel_r5.57/fasta/dmel-all-chromosome-r5.57.fasta.gz)) + (alternatively, for *Drosophila*, download from the [Flybase FTP site](ftp://ftp.flybase.net/releases/current/) + e.g. [*D. melanogaster* release 5.57](ftp://ftp.flybase.net/releases/FB2014_03/dmel_r5.57/fasta/dmel-all-chromosome-r5.57.fasta.gz)) 1. Extract the .gz file 1. Run bowtie2-build in the directory containing the extracted .fasta file. For the examples above: From 4e48f34a2835b32c3fdfd83d62cd56782888f110 Mon Sep 17 00:00:00 2001 From: Owen Marshall Date: Thu, 28 May 2015 17:41:57 +0100 Subject: [PATCH 2/6] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index bfeb109..58ad23c 100644 --- a/README.md +++ b/README.md @@ -34,8 +34,8 @@ Prebuilt GATC fragment files used by the script are available for the following 1. Download the latest FASTA genome primary_assembly (or toplevel) file from [Ensembl](http://ftp.ensembl.org/pub/current_fasta/) e.g. [the current release for *Mus musculus*](http://ftp.ensembl.org/pub/current_fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz) - (alternatively, for *Drosophila*, download from the [Flybase FTP site](ftp://ftp.flybase.net/releases/current/) - e.g. [*D. melanogaster* release 5.57](ftp://ftp.flybase.net/releases/FB2014_03/dmel_r5.57/fasta/dmel-all-chromosome-r5.57.fasta.gz)) + (alternatively, for *Drosophila*, download from the Flybase FTP site (ftp://ftp.flybase.net/releases/current/) + e.g. ftp://ftp.flybase.net/releases/FB2014_03/dmel_r5.57/fasta/dmel-all-chromosome-r5.57.fasta.gz ) 1. Extract the .gz file 1. Run bowtie2-build in the directory containing the extracted .fasta file. For the examples above: From 033e8d65ac8b9cf0d92eb11f0dd772f0228fe113 Mon Sep 17 00:00:00 2001 From: Owen Marshall Date: Thu, 28 May 2015 17:43:55 +0100 Subject: [PATCH 3/6] Update README.md --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 58ad23c..7025db9 100644 --- a/README.md +++ b/README.md @@ -7,8 +7,8 @@ We use a single pipeline script to handle sequence alignment, read extension, bi ### Download Download the latest version of the pipeline script and associated files: -* [As a zipfile](https://github.com/owenjm/damidseq_pipeline/zipball/master) * [As a tarball](https://github.com/owenjm/damidseq_pipeline/tarball/master) +* [As a zipfile](https://github.com/owenjm/damidseq_pipeline/zipball/master) Prebuilt GATC fragment files used by the script are available for the following genomes: * [*Drosophila melanogaster* r5.57](https://github.com/owenjm/damidseq_pipeline/raw/gh-pages/pipeline_gatc_files/Dmel_r5.57.GATC.gff.gz) @@ -26,7 +26,7 @@ Prebuilt GATC fragment files used by the script are available for the following ### Installation -1. Unzip the pipeline script zip file, make the damid_pipeline.pl file executable and place it in your path +1. Extract the pipeline script archive, make the damid_pipeline file executable and place it in your path 1. Install [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) 1. Obtain Bowtie 2 indices provided by [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) or [Illumina's iGenome](http://support.illumina.com/sequencing/sequencing_software/igenome.html) @@ -69,7 +69,7 @@ In order to run correctly, the script needs to know the locations of two paths, In order to setup the pipeline to process the *D. melanogaster* genome, for example, the first-run command would be: - damidseq_pipeline.pl --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57 + damidseq_pipeline --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57 If these paths do not already exist and the script is run with these options and correct values, the paths will be saved for all future runs unless overridden on the command-line. @@ -83,13 +83,13 @@ The script will by default determine sample names from the file names, and expec To see all available options, run the script with --help command-line option: - damidseq_pipeline.pl --help + damidseq_pipeline --help This will give you a list of adjustable parameters and their default and current values if applicable. We recommend keeping these at the default value in most cases; however, these can be modified on the command-line with --option=value (no spaces). To save modified values for all future runs, run the script with the parameter you wish to change together with the --save_defaults command-line option: - damidseq_pipeline.pl --save_defaults + damidseq_pipeline --save_defaults If bowtie2 and samtools are not in your path, you can specify these on the command-line also. @@ -105,12 +105,12 @@ Either file can be converted to .tdf format for viewing in [IGV](http://www.broa If the user expects to process data from multiple genomes, separate genome specifications can be saved by using the --save_defaults=[name] along with the --bowtie2_genome_dir and --gatc_frag_file options (and any other custom options that the user wishes to set as default for this genome, e.g. the bin width). For e.g.: - damidseq_pipeline.pl --save_defaults=fly --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57 - damidseq_pipeline.pl --save_defaults=mouse --bins=500 --gatc_frag_file=path/to/MmGRCm38.GATC.gff.gz --bowtie2_genome_dir=path/to/Mm_GRCm38/GRCm38 + damidseq_pipeline --save_defaults=fly --gatc_frag_file=path/to/Dmel_r5.57.GATC.gff.gz --bowtie2_genome_dir=path/to/dmel_r5.57/dmel_r.5.57 + damidseq_pipeline --save_defaults=mouse --bins=500 --gatc_frag_file=path/to/MmGRCm38.GATC.gff.gz --bowtie2_genome_dir=path/to/Mm_GRCm38/GRCm38 Once set up, different genome definitions can be quickly loaded using the --load_defaults=[name] option, e.g.: - damidseq_pipeline.pl --load_defaults=fly + damidseq_pipeline --load_defaults=fly All currently saved genome definitions can be listed using --load_defaults=list. From b5226eac2fcec33c535082c157b0fcb2cffb8cf1 Mon Sep 17 00:00:00 2001 From: Owen Marshall Date: Thu, 28 May 2015 17:44:44 +0100 Subject: [PATCH 4/6] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7025db9..e27d254 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Processing DamID-seq data involves extending single-end reads, aligning the reads to the genome and determining the coverage, similar to processing regular ChIP-seq datasets. However, as DamID data is represented as a log2 ratio of (Dam-fusion/Dam), normalisation of the sample and Dam-only control is necessary and adding pseudocounts to mitigate the effect of background counts is highly recommended. -We use a single pipeline script to handle sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below). +The damidseq_pipeline is a single script that automatically handles sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below). ### Download From 5fd80fb6af4ad0a5dde8e59a1ea5a6ddf98b0555 Mon Sep 17 00:00:00 2001 From: Owen Marshall Date: Thu, 28 May 2015 17:45:01 +0100 Subject: [PATCH 5/6] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e27d254..2dd3360 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Processing DamID-seq data involves extending single-end reads, aligning the reads to the genome and determining the coverage, similar to processing regular ChIP-seq datasets. However, as DamID data is represented as a log2 ratio of (Dam-fusion/Dam), normalisation of the sample and Dam-only control is necessary and adding pseudocounts to mitigate the effect of background counts is highly recommended. -The damidseq_pipeline is a single script that automatically handles sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below). +damidseq_pipeline is a single script that automatically handles sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below). ### Download From e1c4a2b70963ccc2c1038214a46c699d6dec7d1e Mon Sep 17 00:00:00 2001 From: Owen Marshall Date: Thu, 28 May 2015 17:45:25 +0100 Subject: [PATCH 6/6] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2dd3360..5055ab1 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Processing DamID-seq data involves extending single-end reads, aligning the reads to the genome and determining the coverage, similar to processing regular ChIP-seq datasets. However, as DamID data is represented as a log2 ratio of (Dam-fusion/Dam), normalisation of the sample and Dam-only control is necessary and adding pseudocounts to mitigate the effect of background counts is highly recommended. -damidseq_pipeline is a single script that automatically handles sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below). +[damidseq_pipeline](https://github.com/owenjm/damidseq_pipeline/tarball/master) is a single script that automatically handles sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in GFF or bedGraph format. These files can easily be converted to TDF for viewing in [IGV](http://www.broadinstitute.org/software/igv/) with the provided [gff2tdf.pl](http://github.com/owenjm/damid_pipeline/blob/master/gff2tdf.pl?raw=true) script (see below). ### Download