Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methylation #543

Draft
wants to merge 60 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
7f341d2
feat: accept base modification workflow and input
KloostermanJoukje Oct 26, 2023
7bc46f6
feat: add first draft vip base modification workflow
KloostermanJoukje Nov 1, 2023
b83599b
feat: added first draft base modification config file
KloostermanJoukje Nov 1, 2023
ab020d6
fix: input only tsv file not pod5 dir
KloostermanJoukje Nov 1, 2023
36bf8e7
feat: added working dorado tool
KloostermanJoukje Nov 13, 2023
a8e4baf
feat: added config for dorado tool
KloostermanJoukje Nov 13, 2023
47a37b5
fix: no process named samtools fixed
KloostermanJoukje Nov 13, 2023
3d9ca87
Merge branch 'main' into PoC/Methylation, use vip relase 7.0.0
KloostermanJoukje Nov 13, 2023
44fb4a6
Feat: added modules for pipeline
KloostermanJoukje Nov 16, 2023
d76589e
feat: updated workflow for modkit
KloostermanJoukje Nov 16, 2023
9aa3601
feat: updated config for modkit
KloostermanJoukje Nov 16, 2023
e4028d0
fix: updated modules and added template dorado tool
KloostermanJoukje Nov 21, 2023
da3d75f
Update: newest vip version
KloostermanJoukje Nov 21, 2023
8ef202f
docs: updated usage in README.md
KloostermanJoukje Nov 28, 2023
ddd2212
fix: improved workflow and added comments
KloostermanJoukje Nov 29, 2023
32520ca
fix: improved dorado process and added comments
KloostermanJoukje Nov 29, 2023
29b1611
fix: improved process and added comments
KloostermanJoukje Nov 29, 2023
3778998
fix: improved process samtools and added comments
KloostermanJoukje Nov 29, 2023
bc5cade
fix: updated config mod file and added comments
KloostermanJoukje Nov 29, 2023
a06be6b
fix: added label to samtools workflow
KloostermanJoukje Nov 29, 2023
6017920
feat: change to reference genome build 38
KloostermanJoukje Dec 11, 2023
3ae3dc9
feat: add tool methplotlib for methylation frequency per region
KloostermanJoukje Dec 12, 2023
3ee0bab
feat: add toolconfig for tool methplotlib
KloostermanJoukje Dec 12, 2023
a0aeedb
bug: methplotlib tool does not yet work
KloostermanJoukje Dec 13, 2023
ebdd11e
feat: add conversion bam to cram to continue with vip pipeline
KloostermanJoukje Dec 13, 2023
de6c4fd
fix: modified base pipeline until making of cram file
KloostermanJoukje Dec 14, 2023
dda383e
Merge branch 'main' into PoC/Methylation, update to newest version vi…
KloostermanJoukje Dec 14, 2023
ac46037
Merge branch 'main' into PoC/Methylation, update to vip 7.2.0
KloostermanJoukje Dec 15, 2023
15299e7
fix: undefined parameter
KloostermanJoukje Dec 15, 2023
8325265
fix: undefined parameter
KloostermanJoukje Dec 15, 2023
879e6fe
Merge branch 'main' of https://github.com/molgenis/vip into PoC/Methy…
KloostermanJoukje Dec 21, 2023
9b4eed8
fix: reference no longer defined null and cpg island detected with mo…
KloostermanJoukje Dec 22, 2023
2d47c88
fixcpg island detected with modkit
KloostermanJoukje Dec 22, 2023
32c886a
Feat: add report to accept bedMethyl files
KloostermanJoukje Jan 22, 2024
9117441
Feat: fix bed to bedmethyl bug, add h to m conversion
KloostermanJoukje Jan 22, 2024
3ee31d8
Refactor: delete unessecary code and update variables
KloostermanJoukje Jan 22, 2024
2369eb5
Refactor: change workflow name from mod to pod5
KloostermanJoukje Jan 23, 2024
8fa5388
refactor: change module mod directory to pod5
KloostermanJoukje Jan 23, 2024
da010a8
Refactor: delete old mod workflow files
KloostermanJoukje Jan 23, 2024
93e19e4
Update docs
KloostermanJoukje Jan 23, 2024
6f0bf24
Fix: pod5 to cram workflow variables found
KloostermanJoukje Jan 23, 2024
45f5f91
Feat: added first modkit "draft" def file
KloostermanJoukje Jan 23, 2024
591a3c9
Fix: cargo not found
KloostermanJoukje Jan 23, 2024
5dd56d4
Refactor: other method for mdokit def file
KloostermanJoukje Jan 24, 2024
9892628
Add extension to bed file
KloostermanJoukje Jan 24, 2024
a10fa30
Refactor: commit added to modkit.def
KloostermanJoukje Jan 25, 2024
5c01e15
Feat: use docker images for Dorado and Modkit
KloostermanJoukje Jan 25, 2024
f5edd79
Feat: add docker images to pod5 workflow
KloostermanJoukje Jan 27, 2024
300b8e6
Feat: add a pod5 workflow test
KloostermanJoukje Jan 27, 2024
229fc8c
Fix: fix typo in template parameter
KloostermanJoukje Jan 27, 2024
e9e9012
Feat: validate dorado model parameter
KloostermanJoukje Jan 29, 2024
a96f6eb
Add modkit and dorado image to install.sh
KloostermanJoukje Jan 29, 2024
fe06d5f
Refactor: cleanup code and add test to docs
KloostermanJoukje Jan 29, 2024
51080df
Fix: download Dorado correctly
KloostermanJoukje Jan 29, 2024
0a5a4ff
Feat: Add dorado Fast, HAC and SUP models to resources
KloostermanJoukje Jan 29, 2024
3d9f831
Update requirements.md
KloostermanJoukje Jan 30, 2024
bac6588
Add: add installation of tool to README.md
KloostermanJoukje Jan 30, 2024
b0fc308
Update docs lead to merge branch 'PoC/Methylation' of https://github.…
KloostermanJoukje Jan 30, 2024
c45f66f
README.md: need to load awscli to execute pod5 test
KloostermanJoukje Jan 30, 2024
7ba380c
README.md: add path to output test pod5
KloostermanJoukje Jan 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ bash vip/install.sh
### Usage
```bash
usage: vip -w <arg> -i <arg> -o <arg>
-w, --workflow <arg> workflow to execute. allowed values: cram, fastq, gvcf, vcf
-w, --workflow <arg> workflow to execute. allowed values: cram, fastq, gvcf, vcf, pod5
-i, --input <arg> path to sample sheet .tsv
-o, --output <arg> output folder
-c, --config <arg> path to additional nextflow .cfg (optional)
Expand All @@ -39,5 +39,41 @@ pip install mkdocs mkdocs-mermaid2-plugin
mkdocs serve
```

## Proof of Concept - Methylation
All the files and directories that are adapted or added for the support of base modification and POD5 data
```
config/nxf_pod5.config
config/nxf_vcf.config
docs/
modules/pod5/
modules/vcf/report.nf
modules/vcf/templates/report.sh
resources/pod5/
test/suites/pod5/
utils/build.sh
vip_pod5.nf
vip_vcf.nf
vip.sh
install.sh
```

## How to install VIP and test this branch
```
# Clone repository and switch to PoC/Methylation branch
git clone https://github.com/molgenis/vip.git
cd vip
git checkout PoC/Methylation

# Install to download tools
bash install.sh

# Test the pod5 workflow
cd test
ml awscli
bash test.sh -t pod5

# Output can be found in test/output/
```

### License
VIP is an aggregate work of many works, each covered by their own licence(s). For the purposes of determining what you can do with specific works in VIP, this policy should be read together with the licence(s) of the relevant tools. For the avoidance of doubt, where any other licence grants rights, this policy does not modify or reduce those rights under those licences.
39 changes: 39 additions & 0 deletions config/nxf_pod5.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
includeConfig 'nxf.config'
includeConfig 'nxf_cram.config'

// Environmental commands
env {
CMD_DORADO = "apptainer exec --nv --no-mount home --bind \${TMPDIR} ${APPTAINER_CACHEDIR}/dorado-shac28cd94f2303b0493a4b16ca86e711852c2b8525.sif dorado"
CMD_MODKIT = "apptainer exec --no-mount home --bind \${TMPDIR} ${APPTAINER_CACHEDIR}/modkit-sha3745cd8f97213eaf908f5fbf4f2f8b8e2cedfc30.sif modkit"
}

// Process how to execute
process {
withLabel: 'dorado'{
executor = 'slurm'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not hardcode, the executor is selected by the user or auto-selected elsewhere

memory = '40GB'
time = '10h'
cpus = 20
clusterOptions = '--gres=gpu:a40:1 --qos=priority'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not hardcore qos, this is selected by the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not assume that gpus are available. I suggest to make it configurable somehow.

}

withLabel: 'sort_bam'{
executor = 'slurm'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

memory = '10GB'
time = '10h'
cpus = 10
}

withLabel: 'modkit'{
executor = 'slurm'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

memory = '30GB'
time = '10h'
cpus = 5
}
}

// Parameters used in workflow pod5
params {
dorado_model = "${projectDir}/resources/pod5/[email protected]/"
}

6 changes: 4 additions & 2 deletions config/nxf_vcf.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ process {
}

withLabel: 'vcf_report' {
memory = '4GB'
memory = '100GB'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document why the process requires 100GB instead of 4GB

}
}

Expand Down Expand Up @@ -106,9 +106,11 @@ params {

report {
include_crams = true
include_bedmethyls = true
max_records = ""
max_samples = ""
template = ""
template = "${projectDir}/resources/pod5/pod5_template.html"
vcf_report_jar = "${projectDir}/resources/pod5/pod5-vcf-report.jar"

GRCh38 {
genes = "${projectDir}/resources/GRCh38/GCF_000001405.39_GRCh38.p13_genomic_mapped.gff.gz"
Expand Down
4 changes: 3 additions & 1 deletion docs/about/acknowledgements.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,6 @@ Standing on the shoulders of giants. This project could not have possible withou
- [cuteSV](https://github.com/tjiangHIT/cuteSV)
- [Straglr](https://github.com/philres/straglr)
- [Stranger](https://github.com/Clinical-Genomics/stranger)
- [fastp](https://github.com/OpenGene/fastp)
- [fastp](https://github.com/OpenGene/fastp)
- [Dorado](https://github.com/nanoporetech/dorado)
- [Modkit](https://github.com/nanoporetech/modkit)
18 changes: 18 additions & 0 deletions docs/examples/pod5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# POD5
To run vip with POD5 data, just specify the POD5 paths in your sample sheet.

## Samplesheet
See an example for the samplesheet below, the example shows the samplesheet for a run starting from the `pod5`.

```
individual_id pod5
your_sample_id path/to/your/data_1.pod5,path/to/your/data_2.pod5
```

## Run the pipeline
```bash
cd vip
vip --workflow pod5 --input path/to/samplesheet.tsv --output path/to/output/folder
```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example would benefit from an actual report that displays methylation in a report. Admittedly other sections of the VIP documentation could benefit from more examples as well. As a user it is hard to understand how methylation is beneficial.

For an example on how to execute the `pod5` workflow see [here](https://github.com/molgenis/vip/blob/229fc8c6d01bfb9e0dcdfee85d6e903b31f71f7a/test/suites/pod5/hg001_giab_2023.05.sh#L16C1-L16C28)
2 changes: 1 addition & 1 deletion docs/get_started/requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Before installing VIP please check whether your system meets the following requi
- Bash ≥ 3.2
- Java ≥ 11
- [Apptainer](https://apptainer.org/docs/admin/main/installation.html#install-from-pre-built-packages) (setuid installation)
- 8GB RAM <sup>1</sup>
- 100GB RAM <sup>1</sup>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: update when config ram is updated

- 150GB disk space

1) The memory requirements differ per workflow and depend, on the size of your input data, the scheduler that you use, the amount of parallelization. For example, executing VIP using a job scheduler will reduce the memory requirements on the system submitting the jobs to 1-2GB.
Expand Down
4 changes: 3 additions & 1 deletion docs/home/key_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
VIP is an easy to install, easy to use, portable and flexible pipeline implemented using [Nextflow](https://www.nextflow.io/).
Features include:

- Workflows for a broad range of input file types: `bam`, `cram`, `fastq`, `g.vcf`, `vcf`
- Workflows for a broad range of input file types: `pod5`, `bam`, `cram`, `fastq`, `g.vcf`, `vcf`
- Produces stand-alone variant interpretation HTML report with integrated genome browser
- Long-read sequencing support (Oxford Nanopore, PacBio HiFi)
- Supports base modification in `cram` files with methylation tags: [SAMtags](https://samtools.github.io/hts-specs/SAMtags.pdf)
- Supports bedmethyl visualisation in genome browser
- Short-read sequencing support (Illumina, both single and paired-end reads)
- Supports GRCh38, supports GRCh37 and T2T via liftover
- Short variant detection
Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Variant Interpretation Pipeline (VIP)
VIP is a flexible human variant interpretation pipeline for rare disease using state-of-the-art pathogenicity prediction ([CAPICE](https://github.com/molgenis/capice)) and template-based interactive reporting to facilitate decision support.

The VIP pipeline can be used starting from either your `fastq`, `bam/cram` or `.g.vcf/vcf` data,
The VIP pipeline can be used starting from either your `pod5`, `fastq`, `bam/cram` or `.g.vcf/vcf` data,
every entry point will result in a `vcf` file with your annotated, classified and filtered variants
as well as a interactive HTML report with the same variants, prioritized by the CAPICE pathogenicity score
and providing additional aids like a genome browser and a representation of the decisions leading to the VIP classification.
VIP can be used for single patients, families or cohort data.

[Click here for a live example](vip_giab_hg001.html)

![Example report](img/report_example.png)]
![Example report](img/report_example.png)

*Above: report example*

Expand Down
3 changes: 2 additions & 1 deletion docs/usage/command-line-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ In addition to the `.vcf.gz` an interactive `.html` report is produced that can

```
usage: vip -w <arg> -i <arg> -o <arg>
-w, --workflow <arg> workflow to execute. allowed values: cram, fastq, gvcf, vcf
-w, --workflow <arg> workflow to execute. allowed values: cram, fastq, gvcf, vcf, pod5
-i, --input <arg> path to sample sheet .tsv
-o, --output <arg> output folder
-c, --config <arg> path to additional nextflow .cfg (optional)
Expand All @@ -30,6 +30,7 @@ usage: vip -w <arg> -i <arg> -o <arg>
By default `vip`:

- Assumes an Illumina sequencing platform was used to generate the input data
- Assumes Nanopore sequencing was used to generate input data for `pod5` workflow
- Assumes whole-genome sequencing (WGS) method was used to generate the input data
- Uses a GRCh38 reference genome ([GCA_000001405.15 / GCF_000001405.26](https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/))
- Provides classification trees for default variant filtration. For details, see [here](../advanced/classification_trees.md)
Expand Down
5 changes: 5 additions & 0 deletions docs/usage/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ An additional configuration file can be supplied on the command-line to overwrit
**Warning:**
Please take note of the fact that for a different reference fasta.gz the unzipped referenfasta file is also required. Both the zipped and unzipped fasta should have an index.

### POD5
| key | default | description |
|---------------------------|-------------|--------------------------------------------------------------------------------------------------------|
| dorado_model | *installed* | for details, see [here](https://github.com/nanoporetech/dorado) |

### FASTQ
| key | default | description |
|---------------------------|-------------|--------------------------------------------------------------------------------------------------------|
Expand Down
6 changes: 6 additions & 0 deletions docs/usage/input.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ The following sections describe the columns that can be used in every sample-she

<sup>1</sup> Exception: if no probands are defined in the sample-sheet then all samples are considered to be probands.

## Columns: POD5
| column | type | required | default | description |
|-------------------------|----------|----------|--------------|-------------------------------------------------------------------------------------------------------------|
| ``pod5`` | ``file`` | yes | | allowed file extensions: ``pod5`` |

## Columns: FASTQ
| column | type | required | default | description |
|-------------------------|---------------|-----------------|--------------|-------------------------------------------------------------------------------------------------------------|
Expand Down Expand Up @@ -68,3 +73,4 @@ The following sections describe the columns that can be used in every sample-she
| ``assembly`` | ``enum`` | | ``GRCh38`` | allowed values: [``GRCh37``, ``GRCh38``, ``T2T``], value must be the same for all project samples |
| ``vcf`` | ``file`` | yes | | allowed file extensions: [``vcf``, ``vcf.gz``, ``vcf.bgz``, ``bcf``, ``bcf.gz``, ``bcf.bgz``], value must be the same for all project samples |
| ``cram`` | ``file`` | | | allowed file extensions: [``bam``, ``cram``, ``sam``] |
| ``bedmethyl``| ``file`` | | | allowed file extensions: ``bedmethyl`` |
19 changes: 15 additions & 4 deletions docs/usage/workflow.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,19 @@
# Workflow
VIP consists of four workflows depending on the type of input data: fastq, bam/cram, gvcf or vcf.
The `fastq` workflow is an extension of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.
VIP consists of five workflows depending on the type of input data: pod5, fastq, bam/cram, gvcf or vcf.
The `fastq` and `pod5` workflows RE an extension of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `fastq` and `pod5` workflows RE an extension of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.
The `fastq` and `pod5` workflows are extensions of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.

The `vcf` workflow produces the pipeline outputs as described [here](./output.md).
The following sections provide an overview of the steps of each of these workflows.
The following sections provide an overview of the steps of each of these workflows.

## POD5
The `pod5` workflow consists of the following steps:

1. Parallelize sample sheet per sample and for each sample
2. Modified basecalling and alignment using [Dorado](https://github.com/nanoporetech/dorado) producing a `bam` file per sample
3. Sorting the `bam` file per sample and create an index and stats file using [Samtools](http://samtools.github.io/)
4. Perform pileup with [Modkit](https://github.com/nanoporetech/modkit) to construct a bedMethyl table per sample
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Perform pileup with [Modkit](https://github.com/nanoporetech/modkit) to construct a bedMethyl table per sample
4. Perform pileup with [Modkit](https://github.com/nanoporetech/modkit) to construct a bedMethyl file per sample

5. Continue with step 3. of the `cram` workflow

For details, see [here](https://github.com/molgenis/vip/blob/main/vip_pod5.nf).

## FASTQ
The `fastq` workflow consists of the following steps:
Expand All @@ -24,7 +35,7 @@ The `cram` workflow consists of the following steps:
1. Using [ExpansionHunter](https://github.com/Illumina/ExpansionHunter) for Illumina short read data.
2. Using this [fork of Straglr](https://github.com/philres/straglr) for PacBio and Nanopore long read data, this fork is chosen over the original [Straglr](https://github.com/bcgsc/straglr) because of the VCF output that enables VIP to combine it with the SV and SNV data in the VCF workflow.
4. Parallelize cram in chunks consisting of one or more contigs and for each chunk
1. Perform short variant calling with [DeepVariant](https://github.com/google/deepvariant) producing a `gvcf` file per chunk per sample, the gvcfs of the samples in a project are than merged to one vcf per project (using [GLnexus](https://github.com/dnanexus-rnd/GLnexus).
1. Perform short variant calling with [DeepVariant](https://github.com/google/deepvariant) producing a `gvcf` file per chunk per sample, the gvcfs of the samples in a project are than merged to one vcf per project (using [GLnexus](https://github.com/dnanexus-rnd/GLnexus)).
2. Perform structural variant calling with [Manta](https://github.com/Illumina/manta) or [cuteSV](https://github.com/tjiangHIT/cuteSV) producing a `vcf` file per chunk per project.
5. Concatenate short variant calling and structural variant calling `vcf` files per chunk per sample
6. Continue with step 3. of the `vcf` workflow
Expand Down
2 changes: 2 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -74,12 +74,14 @@ download_files() {
urls+=("c7655e4ffce0178a1a0dcc0ed097cd8f" "images/cutesv-2.0.3.sif")
urls+=("8efa3c0f6c0f5378ca22d16074f50dfe" "images/deepvariant-1.6.0.sif")
urls+=("b67e8c1d774c0d22de70b7be79aaa05e" "images/deepvariant_deeptrio-1.6.0.sif")
urls+=("8d7a34c469bbd1d27c324a867713cd4b" "images/dorado-shac28cd94f2303b0493a4b16ca86e711852c2b8525.sif")
urls+=("78a8ce16c9d8bac53e5fbca4f763dcef" "images/expansionhunter-5.0.0.sif")
urls+=("afed919dc16ccdae1869cf6dbc5a19d5" "images/fastp-0.23.4.sif")
urls+=("494c8c9e1031828f48027e34032de423" "images/gado-1.0.3.sif")
urls+=("d25ba2124ef883b1b6f7a2eff2cb8201" "images/glnexus_v1.4.5-patched.sif")
urls+=("ff8aceb2c9f185307a69b981ba08efd8" "images/manta-1.6.0.sif")
urls+=("1e0caddbdd755bf608ef024e3d0a2f19" "images/minimap2-2.26.sif")
urls+=("7422915ce79a9dc120cb82fa4f2c06dd" "images/modkit-sha3745cd8f97213eaf908f5fbf4f2f8b8e2cedfc30.sif")
urls+=("06ac8a76a307fa42fffd80ab906fd24b" "images/picard-3.1.1.sif")
urls+=("9a4b685b26744113d3ea0a3904c02706" "images/samtools-1.17-patch1.sif")
urls+=("2c18fcda2660792a7c8ba390463ae7ac" "images/straglr-philres-1.4.2.sif")
Expand Down
19 changes: 19 additions & 0 deletions modules/pod5/dorado.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
process dorado {
// Basecall pod5 files using Dorado
label 'dorado'
publishDir "$params.output/intermediates", mode: 'link'

input:
tuple val(meta), path(pod5)

output:
tuple val(meta), path(bam)

shell:
reference=params[params.assembly].reference.fasta
bam="${meta.project.id}_${meta.sample.family_id}_${meta.sample.individual_id}.bam"

template "dorado.sh"


}
25 changes: 25 additions & 0 deletions modules/pod5/modkit.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
process modkit {
// Proccess bam files using Modkit tool

label 'modkit'
publishDir "$params.output/intermediates", mode: 'link'

input:
tuple val(meta), path(sorted_bam), path(sorted_bam_index)

output:
tuple val(meta), path(bedmethyl)

shell:
refSeqPath = params[params.assembly].reference.fasta
reference = refSeqPath.substring(0, refSeqPath.lastIndexOf('.'))
name = "${meta.project.id}_${meta.sample.family_id}_${meta.sample.individual_id}"
bedmethyl = "${name}.bedmethyl"
converted_bam = "${name}_converted.bam"
converted_bam_index = "${name}_converted.bam.csi"
summary_modkit = "${name}_summary_modkit.txt"
log_modkit = "${name}_modkit.log"

template 'modkit.sh'

}
19 changes: 19 additions & 0 deletions modules/pod5/samtools.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
process sort_bam {
// Sort bam files using SAMTools
label "sort_bam"
publishDir "$params.output/intermediates", mode: 'link'

input:
tuple val(meta), path(bam)

output:
tuple val(meta), path(sortedBam), path(sortedBamIndex), path(sortedBamStats)

shell:
sortedBam="${meta.project.id}_${meta.sample.family_id}_${meta.sample.individual_id}_sorted.bam"
sortedBamIndex="${meta.project.id}_${meta.sample.family_id}_${meta.sample.individual_id}_sorted.bam.csi"
sortedBamStats="${meta.project.id}_${meta.sample.family_id}_${meta.sample.individual_id}_sorted.bam.stats"

template 'samtools.sh'

}
14 changes: 14 additions & 0 deletions modules/pod5/templates/dorado.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash
set -euo pipefail

mod_basecaller() {
# Command for Dorado tool
echo "working"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

${CMD_DORADO} basecaller !{params.dorado_model} ./ --modified-bases 5mCG_5hmCG --reference !{reference} > !{bam}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command fails if variables contain characters such as a space:

Suggested change
${CMD_DORADO} basecaller !{params.dorado_model} ./ --modified-bases 5mCG_5hmCG --reference !{reference} > !{bam}
${CMD_DORADO} basecaller "!{params.dorado_model}" ./ --modified-bases 5mCG_5hmCG --reference "!{reference}" > "!{bam}"

}

main() {
mod_basecaller
}

main "$@"
25 changes: 25 additions & 0 deletions modules/pod5/templates/modkit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
set -euo pipefail

summary() {
# Use modkit tool to summarize bam files
${CMD_MODKIT} summary !{sorted_bam} > !{summary_modkit}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CMD_MODKIT} summary !{sorted_bam} > !{summary_modkit}
${CMD_MODKIT} summary "!{sorted_bam}" > "!{summary_modkit}"

}

adjust_mod() {
${CMD_MODKIT} adjust-mods !{sorted_bam} !{converted_bam} --convert h m
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CMD_MODKIT} adjust-mods !{sorted_bam} !{converted_bam} --convert h m
${CMD_MODKIT} adjust-mods "!{sorted_bam}" "!{converted_bam}" --convert h m

${CMD_SAMTOOLS} index -c !{converted_bam}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CMD_SAMTOOLS} index -c !{converted_bam}
${CMD_SAMTOOLS} index -c "!{converted_bam}"

}

pileup() {
# Use modkit tool to process bam to bedmethyl file
${CMD_MODKIT} pileup !{converted_bam} !{bedmethyl} --cpg --ref !{reference} --only-tabs --log-filepath !{log_modkit}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CMD_MODKIT} pileup !{converted_bam} !{bedmethyl} --cpg --ref !{reference} --only-tabs --log-filepath !{log_modkit}
${CMD_MODKIT} pileup "!{converted_bam}" "!{bedmethyl}" --cpg --ref "!{reference}" --only-tabs --log-filepath "!{log_modkit}"

}

main() {
summary
adjust_mod
pileup
}

main "$@"
Loading