Skip to content

Commit

Permalink
Pipeline release v2.0 (#245)
Browse files Browse the repository at this point in the history
* Adds workdir cleanup option (#238)

* Adds --cleanup option to clean work directory

* Adds explanatory comment to config for cleanup option

* Adds "Workdir cleanup" param to pipeline summary

* Adds --cleanup option description to help message

* Adds --cleanup option description to usage.md

* Adds singularity check in ci.yml (#240)

* Adds local singularity profile (#239)

* Adds profile to run with singularity locally

* Adds docks on how to run the pipeline locally

* Adds link to new docs to README.md

* Renames profile to singularity

* Adds singularity check in ci.yml (#240)

* Renames singularity_local -> singularity everywhere

Co-authored-by: cgpu <[email protected]>

* Adds workflow.onComplete notifications

* Reduces ci tests (#241)

* Reduces ci tests to only run on pull_request, not push

* Reduces ci tests to only run with max_retries: 1

* Updates ci test nextflow version 19.04.0 -> 20.01.0

* Removes max_retries matrix option from ci tests

Co-authored-by: cgpu <[email protected]>

* Adds trimmomatic logs to multiqc report (#244)

* Adds trimmomatic logs to multiqc report

* Puts Trimmomatic above STAR in MultiQC report

* Updates all tools and moves docker containers to anczukowlab (#248)

* Update usage.md

Best solution to #246 is to update documentation.

* Update usage.md

* Update run_on_sumner.md

Adding some clarification to address #247

* Update run_on_sumner.md

Adding the description to run pipeline with bams.csv to documentation

* Implements .command.* save in results/ (#251)

* Implements .command.* save in results/

* Cherry-pick residue removal

* Updates ci strategy to fail-fast: false

* Fix for "Unknown method `optional` on FileInParam type"

The mistake was I pasted in the input instead of the output directive
The failed ci is here:
https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/pull/251/checks?check_run_id=3419423845#step:5:51

* Adds ${task.process} in command-log results naming

* Adds publishDir pattern: negation for logs in all results

* Adds tree view step for verifying results

* Removes install tree [already available] [docker]

* Removes install tree [already available] [singularity]

* Improves folder structure per sample for logs

* Keeps only .command.log, .command.sh, .command.err

* Adds config param (CloudOS configs)

* Removes redundancy of .bw files in  star_mapped folder

After feedback from @angarb who spotted this redundancy,
we are removing the .bw file from the ${SRR}/<all-files> folder
and keeping it only in ${SRR}/<all-files>/all_bigwig only

├── star_mapped
│   ├── SRR4238351
│   │   ├── SRR4238351.Aligned.sortedByCoord.out.bam
│   │   ├── SRR4238351.Aligned.sortedByCoord.out.bam.bai
│   │   ├── SRR4238351.Log.final.out
│   │   ├── SRR4238351.Log.out
│   │   ├── SRR4238351.Log.progress.out
│   │   ├── SRR4238351.ReadsPerGene.out.tab
│   │   ├── SRR4238351.SJ.out.tab
│   │   ├── SRR4238351.Unmapped.out.mate1
│   │   └── SRR4238351.bw
│   └── all_bigwig
│       ├── SRR4238351.bw

* Sets fail-fast strategy to false in singularity CI

Complementary commit of 09b8787

* Fix for indentation

* Fix saving files (#263)

* Fixes *_data/* files not being saved for multiqcs step

* Fixes sample_lst.txt not being saved for prep_de step

* Fixes no files being saved for stringtie_merge step

* Fix prep_de step input

* Saves tmp/*_read_outcomes_by_bam.txt in both rmats proc

* Adds xstag strType parameter (#264)

* Adds sra test (#253)

* Adds SRA test profile

* Adds sra_test to ci tests

* Changes CI strategy to fail-fast:false

* CI syntax fix [previous commit]

* Parameterises echo in process scope

* Adds echo true for ci debugging

* Fixes sra-toolkit run with singularity

* Change sra example file to a really small one

* Revert the main container version to the newer one

* Removes commented unnecessary docker.runOptions line

* Removes failing sra_test ci test for singularity

* Fix star step in ci test

* Makes more robust star issue solution

* Returns errorStrategy = 'finish'

Co-authored-by: cgpu <[email protected]>

* Add b1 - control and b2- case to docs

* permissions changes

* Fix strType issue when stranded=false (#276)

* Update Dockerfile

Accidentally edited this file.

* Parametrize error strategy (clean pr) (#267)

* Parametrizes error strategy

* Adds error_strategy parameter to usage.md docs

* Update log.info to show actual errorStrategy value

* Fix typo

* Adds Changelog to README.md

* Outsources changelog into a separate file

* Fixes containers and parametrizes options in google.config (#281)

* Fixes containers being overwritten by google.config

* Parametrize google options

* [DEL 3039] Implement ftp download for SRA accessions (#2) (#283)

* [DEL 3039] Implement ftp download for SRA accessions (#2)

* add option for read download through FTP

* fix ftp path

* update information on download_from param

* Update run_on_sumner.md

* Fix FTP link generation; add test configs for both pair and single end data

* add catch for when single end run ends in _1

* Update main.nf

Co-authored-by: Vlad-Dembrovskyi <[email protected]>

* Changes http to ftp in get_ftp_accession

Because this works now!!

* Make sra example data smaller

* Re-enable sra_test for singularity, now with ftp

Co-authored-by: Vlad-Dembrovskyi <[email protected]>
Co-authored-by: Vlad-Dembrovskyi <[email protected]>

* Update docs/run_on_sumner.md

Co-authored-by: cgpu <[email protected]>

Co-authored-by: imendes93 <[email protected]>
Co-authored-by: cgpu <[email protected]>

* Create Copying_Files_From_Sumner_to_Cloud

These are instructions on how to copy files from the JAX HPC Sumner to the cloud. Addresses #139

* Rename Copying_Files_From_Sumner_to_Cloud to Copying_Files_From_Sumner_to_Cloud.md

* Makes saving of unmapped files optional, cleanup true by default  (#284)

* [DEL 3039] Implement ftp download for SRA accessions (#2)

* add option for read download through FTP

* fix ftp path

* update information on download_from param

* Update run_on_sumner.md

* Fix FTP link generation; add test configs for both pair and single end data

* add catch for when single end run ends in _1

* Update main.nf

Co-authored-by: Vlad-Dembrovskyi <[email protected]>

* Changes http to ftp in get_ftp_accession

Because this works now!!

* Make sra example data smaller

* Re-enable sra_test for singularity, now with ftp

Co-authored-by: Vlad-Dembrovskyi <[email protected]>
Co-authored-by: Vlad-Dembrovskyi <[email protected]>

* Update docs/run_on_sumner.md

Co-authored-by: cgpu <[email protected]>

* Add #274 , Set cleanup default as true (#3)

Co-authored-by: imendes93 <[email protected]>
Co-authored-by: cgpu <[email protected]>

* Sets default --cleanup false for google.config

* Adds --save_unmapped to usage.md

* Update usage.md

Updating config info

* Create NF_splicing_pipeline.config

Example NF Config

* Update run_on_sumner.md

* Update run_on_sumner.md

* Update usage.md

* Update main.nf

Updating to match usage.md parameter descriptions

* Update main.nf

* set errorStrategy to finish in base.config for sumner

* Update run_on_sumner.md

* Update help message

* Update usage.md to match main.nf help message

* Update usage.md to fix indentations

* Update usage.md

Moving NF tips from "running pipeline on Sumner"

* Update run_on_sumner.md

Moved NF tips to usage.md

* Update Copying_Files_From_Sumner_to_Cloud.md

* Fix: Removes rmats container from google.config

* Update changelog.md for v2.0

Co-authored-by: cgpu <[email protected]>
Co-authored-by: angarb <[email protected]>
Co-authored-by: Brittany Angarola <[email protected]>
Co-authored-by: imendes93 <[email protected]>
  • Loading branch information
5 people authored Nov 24, 2021
1 parent 12a3227 commit a330894
Show file tree
Hide file tree
Showing 29 changed files with 948 additions and 187 deletions.
34 changes: 29 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
name: splicing-pipelines-nf CI
# This workflow is triggered on pushes and PRs to the repository.
on: [push, pull_request]
on: [pull_request]

jobs:
test:
docker:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
nxf_ver: ['19.04.0', '']
max_retries: [1, 10]
nxf_ver: ['20.01.0', '']
test_type: ['ultra_quick_test', 'sra_test']
steps:
- uses: actions/checkout@v1
- name: Install Nextflow
Expand All @@ -18,4 +19,27 @@ jobs:
sudo mv nextflow /usr/local/bin/
- name: Basic workflow tests
run: |
nextflow run ${GITHUB_WORKSPACE} --max_retries ${{ matrix.max_retries }} -profile base,ultra_quick_test,docker
nextflow run ${GITHUB_WORKSPACE} -profile base,${{ matrix.test_type }},docker
echo "Results tree view:" ; tree -a results
singularity:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
singularity_version: ['3.6.4']
nxf_ver: ['20.01.0', '']
test_type: ['ultra_quick_test', 'sra_test']
steps:
- uses: actions/checkout@v1
- uses: eWaterCycle/setup-singularity@v6
with:
singularity-version: ${{ matrix.singularity_version }}
- name: Install Nextflow
run: |
export NXF_VER=${{ matrix.nxf_ver }}
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Basic workflow tests
run: |
nextflow run ${GITHUB_WORKSPACE} -profile base,${{ matrix.test_type }},singularity --echo true
echo "Results tree view:" ; tree -a results
Empty file modified DAG.png
100644 → 100755
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions NF_splicing_pipeline.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
params {
// Input data:
reads = 'reads.csv'
rmats_pairs = 'rmats_pairs.txt'
run_name = 'B6_finalrun'
download_from = false
key_file = false

// Main arguments:
gtf = '/projects/anczukow-lab/reference_genomes/mouse_black6/Gencode/gencode.vM23.primary_assembly.annotation.gtf'
assembly_name = 'GRCm38'
star_index = '/projects/anczukow-lab/reference_genomes/mouse_black6/Gencode/star_overhangs_2.7.9a/star_2.7.9a_GRCm38_150.tar.gz'
singleEnd = false
stranded = 'first-strand'
readlength = 150

// Trimmomatic:
minlen = 20
slidingwindow = true

//Star:
mismatch = 5
filterScore = 0.66
sjdbOverhangMin = 3
soft_clipping = true
save_unmapped = false

//rMATS:
statoff = false
paired_stats = false
novelSS = false
mil = 50
mel = 500

//Other:
test = false
max_cpus = 72
max_memory = 760.GB
max_time = 72.h
skiprMATS = false
skipMultiQC = false
mega_time = 20.h
debug = false
error_strategy = 'finish'
cleanup = false
}

cleanup = params.cleanup
process {
errorStrategy = params.error_strategy
...
}
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ Documentation about the pipeline, found in the [`docs/`](docs) directory:
3. [Running the pipeline](docs/usage.md)
* [Running on Sumner](docs/run_on_sumner.md)
* [Running on CloudOS](docs/run_on_cloudos.md)
* [Running locally](docs/run_locally.md)

## Pipeline DAG
<img src="https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/blob/c23ee9552eb033dec087fb3b6fb01fe26716ce29/DAG.png" alt="splicing_pip_dag" align = "left" width="600"/>
<img src="https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/blob/c23ee9552eb033dec087fb3b6fb01fe26716ce29/DAG.png" alt="splicing_pip_dag" align = "center" width="600"/>


## Changelog

View changelog at [changelog.md](https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/blob/master/changelog.md)
2 changes: 2 additions & 0 deletions assets/sra-user-settings.mkfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/LIBS/IMAGE_GUID = "aee5f45c-f469-45f1-95f2-b2d2b1c59163"
/libs/cloud/report_instance_identity = "true"
41 changes: 41 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Changelog

### v2.0 - Pipeline improvements

#### Improvements:
- Adds saving of all the process .command* log files to results/process-logs folder (#251)
- Adds pipeline workdir `--cleanup` option to clear all intermediate files on pipeline successful completion (true by default, false for CloudOS) (#238, #284, [089d6e3](https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/pull/245/commits/3b71e038b186bb2bc92debacb02aede7b5dae917))
- Adds pipeline `--error_strategy` parameter to be able to specify pipeline error strategy directly from command line (doesn't work if specified in config linked by `-c` or `-config` nextflow params) (#267)
- Parametrizes google executor parameters so that pipeline can now be run on different CloudOS environments (#281)
- Adds a new `--download_from` option `FTP` mode to download SRA samples from [EBI FTP](https://ftp.sra.ebi.ac.uk/vol1/fastq/) (#283)
- Adds new parameter `--save_unmapped` that makes saving of STAR unmapped files optional (false by default) (#284)

#### Fixes:
- Adds missing trimmomatic logs to the multiqc report (#244)
- Implemented correct support for input strandness in star process when `--stranded` is `second-strand` (was hardcoded to `strType=2` and only supported `first-strand` or `false` before) (#264)
- Issue that stringti_merged results folder as well as some other folders are missing all or some files (#263)
- Fix pipeline crash when `params.stranded` was set to `false` (#276)
- Fixes old parameters in google.config that were undesirably overwriting nextflow.config parameters on CloudOS (#281, [217e202](https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/pull/245/commits/217e202cab3264c9d2d4cafe80b2476a2d837a85))

#### Updates:
- Updates the following tools: (#248)
- **STAR** `2.7.3` -> `2.7.9a` NOTE: Requires a new index! (updated in test profile)
- **Samtools** `1.10` -> `1.13`
- **StringTie** `2.1.3b` -> `2.1.7`
- **Gffread** `0.11.7` -> `0.12.7`
- multiqc `1.8` -> `1.11`
- deeptools `3.4.0` -> `3.5.1`
- bioconductor-rtracklayer `1.46.0` -> `1.52.0`
- gffcompare `0.11.2` -> `0.12.6`
- bedtools `2.29.2` -> `2.30.0`
- sra-tools `2.10.8` -> `2.11.0`
- pigz `2.3.4` -> `2.6.0`
- gdc-client `1.5.0` -> `1.6.1`
- Moves all containers to https://hub.docker.com/u/anczukowlab

#### Maintenance:
- Consideably reduces number of basic redundant CI tests by removing completely the `max_retries` matrix and `push` from `on: [push, pull_request]`
- Adds CI test for sra-downloading pipeline pathway (only supported with docker profile for now) (#253)


## v 1.0 - Initial pipeline release
31 changes: 31 additions & 0 deletions conf/examples/sra_test.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* -------------------------------------------------
* Nextflow config file for running ultra quick tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run jacksonlabs/splicing-pipelines-nf -profile ultra_quick_test
*
*/

params {
// Input data
singleEnd = true
reads = "$baseDir/examples/testdata/sra/sra.csv"
download_from = 'FTP'

// Genome references
gtf = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/genes.gtf'
star_index = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/star_2.7.9a_yeast_chr_I.tar.gz'

// Other
test = true
readlength = 500
// This doesn't make biological sense but prevents all reads being removed during trimming
overhang = 100

// Limit resources
max_cpus = 2
max_memory = 6.GB
max_time = 48.h
}
31 changes: 31 additions & 0 deletions conf/examples/sra_test_paired.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* -------------------------------------------------
* Nextflow config file for running ultra quick tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run jacksonlabs/splicing-pipelines-nf -profile ultra_quick_test
*
*/

params {
// Input data
singleEnd = false
reads = "$baseDir/examples/testdata/sra/sra_test_paired_end.csv"
download_from = 'FTP'

// Genome references
gtf = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/genes.gtf'
star_index = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/star_2.7.9a_yeast_chr_I.tar.gz'

// Other
test = true
readlength = 500
// This doesn't make biological sense but prevents all reads being removed during trimming
overhang = 100

// Limit resources
max_cpus = 2
max_memory = 6.GB
max_time = 48.h
}
31 changes: 31 additions & 0 deletions conf/examples/sra_test_single.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* -------------------------------------------------
* Nextflow config file for running ultra quick tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run jacksonlabs/splicing-pipelines-nf -profile ultra_quick_test
*
*/

params {
// Input data
singleEnd = true
reads = "$baseDir/examples/testdata/sra/sra_test_single_end.csv"
download_from = 'FTP'

// Genome references
gtf = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/genes.gtf'
star_index = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/star_2.7.9a_yeast_chr_I.tar.gz'

// Other
test = true
readlength = 500
// This doesn't make biological sense but prevents all reads being removed during trimming
overhang = 100

// Limit resources
max_cpus = 2
max_memory = 6.GB
max_time = 48.h
}
2 changes: 1 addition & 1 deletion conf/examples/ultra_quick_test.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ params {

// Genome references
gtf = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/genes.gtf'
star_index = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/star.tar.gz'
star_index = 'https://lifebit-featured-datasets.s3-eu-west-1.amazonaws.com/projects/jax/splicing-pipelines-nf/star_2.7.9a_yeast_chr_I.tar.gz'

// Other
test = true
Expand Down
2 changes: 1 addition & 1 deletion conf/executors/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ params {

process {

errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'terminate' }
errorStrategy = 'finish'
maxRetries = params.max_retries
maxErrors = '-1'

Expand Down
20 changes: 8 additions & 12 deletions conf/executors/google.config
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
google {
lifeSciences.bootDiskSize = params.gls_boot_disk_size
lifeSciences.preemptible = true
zone = 'us-east1-b'
network = 'jax-poc-lifebit-01-vpc-network'
subnetwork = 'us-east1-sub2'
lifeSciences.preemptible = params.gls_preemptible
zone = params.zone
network = params.network
subnetwork = params.subnetwork
}

docker.enabled = true

executor {
name = 'google-lifesciences'
}

params {

Expand All @@ -21,12 +18,15 @@ params {
// disk-space allocations for stringtie_merge and rmats
// this default size is based on 100 samples
gc_disk_size = "2000 GB"

cleanup = false // Don't change, otherwise CloudOS jobs won't be resumable by default even if user wants to.

executor = 'google-lifesciences'
}

process {
maxRetries = params.max_retries
errorStrategy = { task.attempt == process.maxRetries ? 'ignore' : task.exitStatus in [3,9,10,14,143,137,104,134,139] ? 'retry' : 'ignore' }
container = 'gcr.io/nextflow-250616/splicing-pipelines-nf:gawk'
withName: 'get_accession' {
disk = "50 GB"
cpus = {check_max(8 * task.attempt, 'cpus')}
Expand Down Expand Up @@ -72,25 +72,21 @@ process {
disk = params.gc_disk_size
cpus = {check_max(8 * task.attempt, 'cpus')}
memory = {check_max(30.GB * task.attempt, 'memory')}
container = 'gcr.io/nextflow-250616/splicing-pipelines-nf:gawk'
}
withName: 'prep_de' {
disk = params.gc_disk_size
cpus = {check_max(8 * task.attempt, 'cpus')}
memory = {check_max(30.GB * task.attempt, 'memory')}
container = 'gcr.io/nextflow-250616/splicing-pipelines-nf:gawk'
}
withName: 'rmats' {
disk = params.gc_disk_size
cpus = { check_max (30 * task.attempt, 'cpus')}
memory = { check_max( 120.GB * task.attempt, 'memory' ) }
container = 'gcr.io/nextflow-250616/rmats:4.1.0'
}
withName: 'paired_rmats' {
disk = params.gc_disk_size
cpus = {check_max(30 * task.attempt, 'cpus')}
memory = {check_max(120.GB * task.attempt, 'memory')}
container = 'gcr.io/nextflow-250616/rmats:4.1.0'
}
withName: 'multiqc' {
disk = "10 GB"
Expand Down
17 changes: 17 additions & 0 deletions conf/executors/singularity.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/*
* -------------------------------------------------
* Nextflow config file for running pipeline with Singularity locally
* -------------------------------------------------
* Base config needed for running with -profile singularity
*/

params {
singularity_cache = "local_singularity_cache"
}

singularity {
enabled = true
cacheDir = params.singularity_cache
autoMounts = true
}

10 changes: 5 additions & 5 deletions containers/download_reads/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ channels:
- defaults
- anaconda
dependencies:
- sra-tools=2.10.8
- pigz=2.3.4
- gdc-client=1.5.0
- samtools=1.10
- bedtools=2.29.2
- sra-tools=2.11.0
- pigz=2.6.0
- gdc-client=1.6.1
- samtools=1.13
- bedtools=2.30.0
2 changes: 1 addition & 1 deletion containers/fasp/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ RUN apt-get update \
&& cd fasp-scripts \
&& python setup.py install \
&& chmod +x fasp/scripts/* \
&& conda install samtools=1.11 -c bioconda -c conda-forge \
&& conda install samtools=1.13 -c bioconda -c conda-forge \
&& conda clean -a

ENV PATH /fasp-scripts/fasp/scripts:$PATH
Loading

0 comments on commit a330894

Please sign in to comment.