Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methylation #543

Draft
wants to merge 60 commits into
base: main
Choose a base branch
from
Draft

Methylation #543

wants to merge 60 commits into from

Conversation

dennishendriksen
Copy link
Contributor

End-to-end tests are not executed by Travis CI, please execute manually:

  • APPTAINER_BIND=$PWD bash test/test.sh passes
  • Updated documentation

KloostermanJoukje and others added 30 commits October 26, 2023 10:04
A module to run dorado basecalling.
A module to run modkit after sorting bam files
Added a template file for modkit tool.
Updated modules to be more generic. Config file need path to data and run id.
added dorado module for shell command.
Merge branch 'main' into PoC/Methylation

mod_basecaller() {
# Command for Dorado tool
echo "working"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

mod_basecaller() {
# Command for Dorado tool
echo "working"
${CMD_DORADO} basecaller !{params.dorado_model} ./ --modified-bases 5mCG_5hmCG --reference !{reference} > !{bam}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command fails if variables contain characters such as a space:

Suggested change
${CMD_DORADO} basecaller !{params.dorado_model} ./ --modified-bases 5mCG_5hmCG --reference !{reference} > !{bam}
${CMD_DORADO} basecaller "!{params.dorado_model}" ./ --modified-bases 5mCG_5hmCG --reference "!{reference}" > "!{bam}"


summary() {
# Use modkit tool to summarize bam files
${CMD_MODKIT} summary !{sorted_bam} > !{summary_modkit}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CMD_MODKIT} summary !{sorted_bam} > !{summary_modkit}
${CMD_MODKIT} summary "!{sorted_bam}" > "!{summary_modkit}"

}

adjust_mod() {
${CMD_MODKIT} adjust-mods !{sorted_bam} !{converted_bam} --convert h m
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CMD_MODKIT} adjust-mods !{sorted_bam} !{converted_bam} --convert h m
${CMD_MODKIT} adjust-mods "!{sorted_bam}" "!{converted_bam}" --convert h m


adjust_mod() {
${CMD_MODKIT} adjust-mods !{sorted_bam} !{converted_bam} --convert h m
${CMD_SAMTOOLS} index -c !{converted_bam}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CMD_SAMTOOLS} index -c !{converted_bam}
${CMD_SAMTOOLS} index -c "!{converted_bam}"

}

withLabel: 'modkit'{
executor = 'slurm'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -5,7 +5,7 @@ Before installing VIP please check whether your system meets the following requi
- Bash ≥ 3.2
- Java ≥ 11
- [Apptainer](https://apptainer.org/docs/admin/main/installation.html#install-from-pre-built-packages) (setuid installation)
- 8GB RAM <sup>1</sup>
- 100GB RAM <sup>1</sup>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: update when config ram is updated

VIP consists of four workflows depending on the type of input data: fastq, bam/cram, gvcf or vcf.
The `fastq` workflow is an extension of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.
VIP consists of five workflows depending on the type of input data: pod5, fastq, bam/cram, gvcf or vcf.
The `fastq` and `pod5` workflows RE an extension of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `fastq` and `pod5` workflows RE an extension of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.
The `fastq` and `pod5` workflows are extensions of the `cram` workflow. The `cram` and `gvcf` workflows are extensions of the `vcf` workflow.

1. Parallelize sample sheet per sample and for each sample
2. Modified basecalling and alignment using [Dorado](https://github.com/nanoporetech/dorado) producing a `bam` file per sample
3. Sorting the `bam` file per sample and create an index and stats file using [Samtools](http://samtools.github.io/)
4. Perform pileup with [Modkit](https://github.com/nanoporetech/modkit) to construct a bedMethyl table per sample
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Perform pileup with [Modkit](https://github.com/nanoporetech/modkit) to construct a bedMethyl table per sample
4. Perform pileup with [Modkit](https://github.com/nanoporetech/modkit) to construct a bedMethyl file per sample

cd vip
vip --workflow pod5 --input path/to/samplesheet.tsv --output path/to/output/folder
```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example would benefit from an actual report that displays methylation in a report. Admittedly other sections of the VIP documentation could benefit from more examples as well. As a user it is hard to understand how methylation is beneficial.

Copy link
Contributor Author

@dennishendriksen dennishendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new or modified test case in test/suites/vcf using bedmethyl would have been nice


vip "${args[@]}" 1> /dev/null

compare expected to actual output and store result
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test currently fails due to syntax error:

Suggested change
compare expected to actual output and store result
# compare expected to actual output and store result


compare expected to actual output and store result
if [ "$(zcat "${OUTPUT_DIR}/vip.vcf.gz" | grep -vc "^#")" -gt 0 ]; then
result="0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output report contains 1 record and is 5.4MB, this might indicate an issue for larger number of records. possible cause: the genome browser displays many "white" reads possibly related to " In IGV, reads with a mapping quality = 0 are displayed in white while reads with a positive mapping quality are displayed in grey".
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants