Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panel bychr #18

Merged
merged 28 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
f3fb024
Update input schema
LouisLeNezet Mar 20, 2024
5dfe3b5
Update all csv tests files
LouisLeNezet Mar 20, 2024
3c7238c
Update configuration
LouisLeNezet Mar 20, 2024
106ee7a
Update test csv
LouisLeNezet Mar 20, 2024
923efb2
Update test config
LouisLeNezet Mar 20, 2024
72dc8e1
Panel, map, and region now separated by chromosome
LouisLeNezet Mar 20, 2024
8d7968c
Update vcf_impute_glimpse
LouisLeNezet Mar 20, 2024
36d3f59
Update vcf_phase_shapeit5
LouisLeNezet Mar 20, 2024
6bf134d
Update multiple_impute_glimpse2
LouisLeNezet Mar 20, 2024
dd34920
Update glimpse2_ligate
LouisLeNezet Mar 20, 2024
6bb635e
Update glimpse chunk
LouisLeNezet Mar 20, 2024
056de1a
Update logos
LouisLeNezet Mar 20, 2024
e81230a
Fix newline in csv
LouisLeNezet Mar 20, 2024
141f493
Fix json
LouisLeNezet Mar 20, 2024
5106d52
Update logos to match template branch
Mar 20, 2024
ccc123e
Update changelog
LouisLeNezet Mar 20, 2024
3c20d0c
Add suggestion and fix PR
LouisLeNezet Mar 21, 2024
7d8dc65
Merge branch 'dev' into panel_bychr
LouisLeNezet Mar 21, 2024
e7dba6f
Simulation downsampling working
LouisLeNezet Mar 21, 2024
846b486
Remove BAM_TO_GENOTYPE for the moment
LouisLeNezet Mar 21, 2024
baf45d8
Update CHANGELOG
LouisLeNezet Mar 21, 2024
ad40e1b
Prettify
LouisLeNezet Mar 21, 2024
6a1d380
Update CI workflow
LouisLeNezet Mar 21, 2024
19fadbf
Update CI workflow
LouisLeNezet Mar 21, 2024
0922f85
Update CI workflow
LouisLeNezet Mar 21, 2024
bd5a9f6
Delete test_panelprep as integrate in imputation
LouisLeNezet Mar 21, 2024
dc39d20
Delete anyOf in schema
Mar 22, 2024
c2883b5
Add task
Mar 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ jobs:
NXF_VER:
- "23.04.0"
- "latest-everything"
TEST_PROFILE:
- "test"
- "test_sim"
steps:
- name: Check out pipeline code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
Expand All @@ -40,12 +43,8 @@ jobs:
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1

- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile "${{ matrix.TEST_PROFILE }}",docker --outdir ./results
10 changes: 8 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,18 @@ Initial release of nf-core/phaseimpute, created with the [nf-core](https://nf-co

### `Changed`

- [#15](https://github.com/nf-core/phaseimpute/pull/15) - Changed test csv files to point to nf-core repository
- [#16](https://github.com/nf-core/phaseimpute/pull/16) - Removed outdir from test config files
- [#18](https://github.com/nf-core/phaseimpute/pull/18)
- Maps and region by chromosome
- update tests config files
- correct meta map propagation
- Test impute and test sim works
- [#19](https://github.com/nf-core/phaseimpute/pull/19) - Changed reference panel to accept a csv, update modules and subworkflows (glimpse1/2 and shapeit5)

### `Fixed`

- [#15](https://github.com/nf-core/phaseimpute/pull/15) - Changed test csv files to point to nf-core repository
- [#16](https://github.com/nf-core/phaseimpute/pull/16) - Removed outdir from test config files

### `Dependencies`

### `Deprecated`
6 changes: 3 additions & 3 deletions assets/panel.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
panel,vcf,index,sites,tsv,legend,phased
1000GP,1000GP.phased.vcf,1000GP.phased.vcf.csi,1000GP.sites,1000GP.tsv,,TRUE
1000GP_RePhase,1000GP.vcf,1000GP.vcf.csi,,,,FALSE
panel,chr,vcf,index
1000GP,chr21,1000GP_21.vcf,1000GP_21.vcf.csi
LouisLeNezet marked this conversation as resolved.
Show resolved Hide resolved
1000GP,chr22,1000GP_22.vcf,1000GP_22.vcf.csi
45 changes: 16 additions & 29 deletions assets/schema_input_panel.json
Original file line number Diff line number Diff line change
@@ -1,48 +1,35 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input.json",
"title": "nf-core/phaseimpute pipeline - params.input_region schema",
"description": "Schema for the file provided with params.input_region",
"$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input_panel.json",
"title": "nf-core/phaseimpute pipeline - params.panel schema",
"description": "Schema for the file provided with params.panel",
"type": "array",
"items": {
"type": "object",
"properties": {
"panel": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Panel name must be provided and cannot contain spaces",
"meta": ["panel"]
"errorMessage": "Panel name must be provided as a string and cannot contain spaces",
"meta": ["id"]
},
"vcf": {
"type": "string",
"pattern": "^\\S+\\.(vcf|bcf)(\\.gz)?$",
"errorMessage": "Panel vcf file must be provided, cannot contain spaces and must have extension '.vcf'"
},
"index": {
"type": "string",
"pattern": "^\\S+\\.(vcf|bcf)(\\.gz)?\\.(tbi|csi)$",
"errorMessage": "Panel vcf index file must be provided, cannot contain spaces and must have extension '.vcf.tbi' or '.vcf.csi'"
},
"sites": {
"chr": {
"type": "string",
"pattern": "^\\S+\\.sites(\\.bcf)?$",
"errorMessage": "Panel sites file must be provided, cannot contain spaces and must have extension '.sites'"
"pattern": "^\\S+$",
"errorMessage": "Chromosome must be provided as a string and cannot contain spaces",
"meta": ["chr"]
},
"tsv": {
"vcf": {
"type": "string",
"pattern": "^\\S+\\.tsv(\\.gz)?$",
"errorMessage": "Panel tsv file must be provided, cannot contain spaces and must have extension '.tsv'"
"pattern": "^\\S+\\.(vcf|bcf)(.gz)?$",
"errorMessage": "Panel file must be provided, cannot contain spaces and must have extension '.vcf' or '.bcf' with optional '.gz' extension"
},
"legend": {
"index": {
"type": "string",
"pattern": "^\\S+\\.legend$",
"errorMessage": "Panel legend file must be provided, cannot contain spaces and must have extension '.legend'"
},
"phased": {
"type": "boolean",
"errorMessage": "Is the vcf given phased? Must be a boolean"
"pattern": "^\\S+\\.(vcf|bcf)\\.(tbi|csi)$",
"errorMessage": "Panel index file must be provided, cannot contain spaces and must have extension '.vcf' or '.bcf' with '.csi' or '.tbi' extension"
}
},
"required": ["panel", "vcf", "index", "phased"]
"required": ["panel", "chr", "vcf", "index"]
LouisLeNezet marked this conversation as resolved.
Show resolved Hide resolved
}
}
15 changes: 4 additions & 11 deletions assets/schema_input_region.json
Original file line number Diff line number Diff line change
@@ -1,23 +1,16 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input.json",
"$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input_region.json",
"title": "nf-core/phaseimpute pipeline - params.input_region schema",
"description": "Schema for the file provided with params.input_region",
"type": "array",
"items": {
"type": "object",
"properties": {
"chr": {
"anyOf": [
{
"type": "string",
"pattern": "^\\S+$"
},
{
"type": "integer",
"pattern": "^\\d+$"
}
]
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Chromosome name must be provided as a string and cannot contain spaces"
},
"start": {
"type": "integer",
Expand Down
24 changes: 24 additions & 0 deletions assets/schema_map.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_map.json",
"title": "nf-core/phaseimpute pipeline - params.map schema",
"description": "Schema for the file provided with params.map",
"type": "array",
"items": {
"type": "object",
"properties": {
"chr": {
"type": "string",
"pattern": "^(chr)?[0-9]+$",
"errorMessage": "Chromosome must be provided and must be a string containing only numbers, with or without the prefix 'chr'",
"meta": ["chr"]
},
"map": {
"type": "string",
"pattern": "^\\S+\\.(g)?map(\\.gz)?$",
"errorMessage": "Map file must be provided, cannot contain spaces and must have extension '.map' or '.gmap' with optional 'gz' extension"
}
},
"required": ["chr", "map"]
}
}
6 changes: 3 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ process {
]
}

// Simulate workflow
withName: VIEW_REGION {
// Simulation workflow
withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_REGION:SAMTOOLS_VIEW' {
LouisLeNezet marked this conversation as resolved.
Show resolved Hide resolved
ext.args = [
].join(' ')
ext.prefix = { "${meta.id}_R${meta.region}" }
}
withName: VIEW_DEPTH {
withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_DOWNSAMPLE:SAMTOOLS_VIEW' {
ext.args = [
].join(' ')
ext.prefix = { "${meta.id}_D${meta.depth}" }
Expand Down
13 changes: 7 additions & 6 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,17 @@ params {

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'
max_memory = '2.GB'
max_time = '1.h'

// Input data
input = "${projectDir}/tests/csv/bam.csv"
input = "${projectDir}/tests/csv/sample_bam.csv"
input_region = "${projectDir}/tests/csv/region.csv"

// Genome references
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa"
panel = "${projectDir}/tests/csv/panel.csv"
phased = true
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa"
panel = "${projectDir}/tests/csv/panel.csv"
phased = true

// Impute parameters
step = "impute"
Expand Down
10 changes: 3 additions & 7 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,19 @@ params {
config_profile_name = 'Full test profile'
config_profile_description = 'Full test dataset to check pipeline function'

// Input data for full size test
// TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
// TODO nf-core: Give any required params for the test so that command line flags are not needed

// Genome references
map = "/groups/dog/llenezet/test-datasets/data/genetic_maps.b38/chr21.b38.gmap.gz"
map = "https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh38.map.zip"
genome = "GRCh38"
fasta = "/groups/dog/llenezet/script/phaseimpute/data/genome.fa"

// Resources increase incompatible with Github Action
max_cpus = 12
max_memory = '50.GB'
max_time = '6.h'

// Input data
input = "tests/csv/sample_sim.csv"
panel = "tests/csv/panel.csv"
input = "${projectDir}/tests/csv/sample_sim_full.csv"
atrigila marked this conversation as resolved.
Show resolved Hide resolved
panel = "${projectDir}/tests/csv/panel_full.csv"
input_region_string = "all"
step = "simulate"
atrigila marked this conversation as resolved.
Show resolved Hide resolved
}
32 changes: 0 additions & 32 deletions conf/test_panelprep.config

This file was deleted.

11 changes: 5 additions & 6 deletions conf/test_sim.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,12 @@ params {
max_time = '6.h'

// Input data
input = "tests/csv/sample_sim.csv"
input_region_file = "tests/csv/regionsheet.csv"
depth = [1, 2]
genome = "GRCh38"
input = "${projectDir}/tests/csv/sample_sim.csv"
input_region = "${projectDir}/tests/csv/region.csv"
depth = 1

map = "/groups/dog/llenezet/test-datasets/data/genetic_maps.b38/chr21.b38.gmap.gz"
fasta = "/groups/dog/llenezet/test-datasets/data/reference_genome/hs38DH.chr21.fa"
map = "${projectDir}/tests/csv/map.csv"
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa"

step = "simulate"
}
30 changes: 17 additions & 13 deletions docs/development.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
# Development

To contribute to this pipeline you will need to install the development environment:
This is possible only on linux or MacOs machine as Nextflow only work on these platform.

```bash
conda env create -f environment.yml
conda activate nf-core-phaseimpute-1.0dev
```

## Add new module

```bash
nf-core modules install
```
## Features and tasks

- [] Add automatic detection of chromosome name to create a renaming file for the vcf
- [] Make the different tests workflows work
- [] Simulation
- [] Validation
- [] Preprocessing
- [x] Imputation
- [] Validation
- [] Postprocessing
- [] Add support of `anyOf()` or `oneOf()` in the nf-core schema for the map, panel and region files
- [] Add nf-test for all modules and subworkflows
- [] Remove all TODOs
- [] Check if panel is necessary depending on the tool selected
- [] Set modules configuration as full path workflow:subworkflow:module
- [] Where should the map file go (separate csv or in panel csv)

## Run tests

Expand All @@ -36,6 +39,7 @@ All channel need to be identified by a meta map as follow:
- M : map used
- T : tool used
- G : reference genome used (is it needed ?)
- D : depth

## Open questions

Expand Down
5 changes: 4 additions & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ workflow NFCORE_PHASEIMPUTE {
ch_input // channel: samplesheet read in from --input
ch_fasta // channel: reference genome FASTA file with index
ch_panel // channel: reference panel variants file
ch_regions // channel: regions to use [meta, region]
ch_regions // channel: regions to use [[chr, region], region]
ch_depth // channel: depth of coverage file [[depth], depth]
ch_map // channel: map file for imputation
ch_versions // channel: versions of software used

Expand All @@ -49,6 +50,7 @@ workflow NFCORE_PHASEIMPUTE {
ch_fasta,
ch_panel,
ch_regions,
ch_depth,
ch_map,
ch_versions
)
Expand Down Expand Up @@ -89,6 +91,7 @@ workflow {
PIPELINE_INITIALISATION.out.fasta,
PIPELINE_INITIALISATION.out.panel,
PIPELINE_INITIALISATION.out.regions,
PIPELINE_INITIALISATION.out.depth,
PIPELINE_INITIALISATION.out.map,
PIPELINE_INITIALISATION.out.versions
)
Expand Down
3 changes: 1 addition & 2 deletions modules/nf-core/samtools/coverage/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ params {

// Input options
input = null
input_region = "all"
input_region = null
LouisLeNezet marked this conversation as resolved.
Show resolved Hide resolved
map = null
tools = null

Expand Down Expand Up @@ -191,6 +191,7 @@ profiles {
}
test { includeConfig 'conf/test.config' }
test_full { includeConfig 'conf/test_full.config' }
test_sim { includeConfig 'conf/test_sim.config' }
}

// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile
Expand Down
Loading
Loading