Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging new changes to master from development branch #197

Merged
merged 37 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
3e85fb1
updated for timeseries
anwarMZ Dec 19, 2024
b49df37
updated path to metadata conf
anwarMZ Dec 20, 2024
ec2fcb6
updated timeseries params
anwarMZ Dec 20, 2024
1708343
updated tsv2pdf input tuple
anwarMZ Dec 20, 2024
58af258
fixed error for timeseries pdf
anwarMZ Dec 20, 2024
a00fab9
updated for timeseries data
anwarMZ Dec 20, 2024
85f2cda
updated conatiner for yaml
anwarMZ Dec 20, 2024
4299a23
updated errors
anwarMZ Dec 20, 2024
12fc7e3
updated timeseries test
anwarMZ Dec 20, 2024
88fb780
updated timeseries test
anwarMZ Dec 20, 2024
f477e4c
updated timeseries test
anwarMZ Dec 20, 2024
c1c8f92
updated timeseries test
anwarMZ Dec 20, 2024
d1ac271
updated timeseries test
anwarMZ Dec 20, 2024
8816b50
updated timeseries test
anwarMZ Dec 20, 2024
eb3ab4a
updated parameteres and testing data for sars-cov-w and mpox
anwarMZ Jan 21, 2025
d2d9cc2
updated local modules structure to allow mapping variant files
anwarMZ Jan 21, 2025
20debb1
updated sub-workflow structure to allow multiple mapping when files a…
anwarMZ Jan 21, 2025
16f0725
updated config files for eagle, modules and parameters
anwarMZ Jan 21, 2025
fff3e0f
udpated asset files
anwarMZ Jan 21, 2025
5ba59f3
updated wastewater workflow
anwarMZ Jan 21, 2025
6cc6704
updated workflows to allow mapping
anwarMZ Jan 21, 2025
f5b9575
updated tests in the actions workflow
anwarMZ Jan 21, 2025
9aabe7b
updated poxmvp testdata
anwarMZ Jan 21, 2025
e178941
updated requirements
anwarMZ Jan 21, 2025
dfaeaf7
updated poxmvp testdata
anwarMZ Jan 21, 2025
6ba8897
updated test data
anwarMZ Jan 21, 2025
73ffe26
updated mpox test_data
anwarMZ Jan 21, 2025
125418e
gzipped metadata
anwarMZ Jan 21, 2025
43fdc4e
updated poxmvp test_data
anwarMZ Jan 21, 2025
cf5a826
dehosting is optiona;
anwarMZ Jan 22, 2025
3e080ea
reading host genome optionally
anwarMZ Jan 22, 2025
f491af5
updated figure
anwarMZ Jan 22, 2025
652a262
updated structure
anwarMZ Jan 22, 2025
6fe44c8
updated documentation
anwarMZ Jan 22, 2025
2c0206d
updated freyja parameter
anwarMZ Jan 22, 2025
1528efd
updated workflow
anwarMZ Jan 23, 2025
6126eee
updated test data for wastewater
anwarMZ Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 76 additions & 10 deletions .github/workflows/nextflow_CI.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
name: Nextflow CI

on:
push:
branches:
- development
pull_request:
branches:
- development
- master

jobs:
test_sarscov2_user:
name: Run pipeline test (user)
name: Run pipeline test (sarscov2_user)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand All @@ -20,12 +18,72 @@ jobs:
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline test (user)
- name: Run pipeline test (sarscov2_user)
run: |
nextflow run main.nf -profile docker --prefix "covidmvp-user-$(date +%Y-%m-%d)" -params-file covidmvp_user.yaml
nextflow run main.nf -profile docker --prefix "covidmvp-user-$(date +%Y-%m-%d)" -params-file covidmvp_user_params.yaml

test_sarscov2_reference:
name: Run pipeline test (reference)
name: Run pipeline test (sarscov2_reference)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline test (sarscov2_reference)
run: |
nextflow run main.nf -profile docker --prefix "covidmvp-clinical-$(date +%Y-%m-%d)" --end_date $(date +%Y-%m-%d) -params-file covidmvp_clinical_params.yaml

test_sarscov2_timeseries:
name: Run pipeline test (sarscov2_timeseries)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline test (sarscov2_timeseries)
run: |
nextflow run main.nf -profile docker --prefix "covidmvp-timeseries-$(date +%Y-%m-%d)" -params-file covidmvp_timeseries_params.yaml

test_sarscov2_wastewater:
name: Run pipeline test (sarscov2_wastewater)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline test (sarscov2_wastewater)
run: |
nextflow run main.nf -profile docker --prefix "covidmvp-wastewater-$(date +%Y-%m-%d)" -params-file covidmvp_wastewater_params.yaml

test_mpox_user:
name: Run pipeline test (mpox_user)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline test (mpox_user)
run: |
nextflow run main.nf -profile docker --prefix "poxmvp-user-$(date +%Y-%m-%d)" -params-file poxmvp_user_params.yaml

test_mpox_reference:
name: Run pipeline test (mpox_reference)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand All @@ -35,13 +93,21 @@ jobs:
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline test (reference)
- name: Run pipeline test (mpox_reference)
run: |
nextflow run main.nf -profile docker --prefix "covidmvp-$(date +%Y-%m-%d)" --end_date $(date +%Y-%m-%d) -params-file covidmvp_clinical_params.yaml
nextflow run main.nf -profile docker --prefix "poxmvp-reference-$(date +%Y-%m-%d)" -params-file poxmvp_clinical_params.yaml

check_success:
name: Check if all tests passed
needs: [test_sarscov2_user, test_sarscov2_reference]
needs:
[
test_sarscov2_user,
test_sarscov2_reference,
test_sarscov2_timeseries,
test_sarscov2_wastewater,
test_mpox_reference,
test_mpox_user
]
runs-on: ubuntu-latest
steps:
- name: Check job status
Expand Down
7 changes: 0 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,9 @@ results/*
.idea/*
.DS_store
*.ipynb
weekly*
*.error
*.out
bin/__*
*.sh
latest*
*.fa.xz
*.gz
hMPXV_*
bin/web.log
assets/config.ini
*.xz
*.log
217 changes: 89 additions & 128 deletions README.md

Large diffs are not rendered by default.

97 changes: 97 additions & 0 deletions assets/archive/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
{
"Format": {
"sep": "tab",
"suffix": ".gz"
},
"Required": {
"isolate": true,
"lineage": true,
"sample_collection_date": true,
"country": true,
"state_province_territory": true
},
"Optional": {
"gene_name": false,
"organism": true,
"host_scientific_name": true,
"host_gender": false,
"host_age_bin": false,
"submission_date": false,
"gisaid_accession": true,
"length": false,
"purpose_of_sampling": false,
"purpose_of_sequencing": false,
"alias": true,
"Type": false,
"Subtype": false,
"SubLineageGroup": false,
"Clade": false,
"last_updated": true
},
"ViralAi_EpiCoV": {
"isolate": "isolate",
"lineage": "raw_lineage",
"organism": "organism",
"gisaid_accession": "gisaid_accession",
"host_scientific_name": "host_scientific_name",
"sample_collection_date": "sample_collection_date",
"country": "geo_loc_name_country",
"state_province_territory": "geo_loc_name_state_province_territory",
"alias": "lineage",
"purpose_of_sampling": "purpose_of_sampling_details",
"purpose_of_sequencing": "purpose_of_sequencing_details",
"last_updated": "last_updated"
},
"Gisaid_EpiCoV": {
"isolate": "isolate",
"lineage": "raw_lineage",
"organism": "organism",
"gisaid_accession": "gisaid_accession",
"host_scientific_name": "host_scientific_name",
"sample_collection_date": "sample_collection_date",
"country": "geo_loc_name_country",
"state_province_territory": "geo_loc_name_state_province_territory",
"alias": "lineage",
"purpose_of_sampling": "purpose_of_sampling_details",
"purpose_of_sequencing": "purpose_of_sequencing_details"
},
"Gisaid_EpiPox": {
"isolate": "isolate",
"lineage": "raw_lineage",
"organism": "organism",
"gisaid_accession": "gisaid_accession",
"host_scientific_name": "host_scientific_name",
"sample_collection_date": "sample_collection_date",
"country": "geo_loc_name_country",
"state_province_territory": "geo_loc_name_state_province_territory",
"alias": "lineage",
"purpose_of_sampling": "purpose_of_sampling_details",
"purpose_of_sequencing": "purpose_of_sequencing_details"
},
"NCBIVirus_EpiPox": {
"isolate": "isolate",
"lineage": "raw_lineage",
"organism": "organism",
"gisaid_accession": "gisaid_accession",
"host_scientific_name": "host_scientific_name",
"sample_collection_date": "sample_collection_date",
"country": "geo_loc_name_country",
"state_province_territory": "geo_loc_name_state_province_territory",
"alias": "lineage",
"purpose_of_sampling": "purpose_of_sampling_details",
"purpose_of_sequencing": "purpose_of_sequencing_details"
},
"Gisaid_EpiFlu": {
"isolate": "isolate",
"lineage": "raw_lineage",
"organism": "organism",
"gisaid_accession": "gisaid_accession",
"host_scientific_name": "host_scientific_name",
"sample_collection_date": "sample_collection_date",
"country": "geo_loc_name_country",
"state_province_territory": "geo_loc_name_state_province_territory",
"alias": "lineage",
"purpose_of_sampling": "purpose_of_sampling_details",
"purpose_of_sequencing": "purpose_of_sequencing_details"
}
}
48 changes: 48 additions & 0 deletions assets/metadata_conf/metadata_flu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# This file contains a high-level summary of metadata columns across different sources.
# general format of csv file is to be tab-demilimited, NO spaces in column headers: underscore (_) is used as best practice.
# gz compressed file

# Required metadata columns
Format:
sep: "tab" # options tab
suffix: ".gz"

Required:
isolate: True
lineage: True
sample_collection_date: True
country: True
state_province_territory: True

# Option metadata columns
Optional:
gene_name: False
organism: True
host_scientific_name: True
host_gender: False
host_age_bin: False
submission_date: False
gisaid_accession: True
length: False
purpose_of_sampling: False
purpose_of_sequencing: False
alias: True
Type: False
Subtype: False
SubLineageGroup: False
Clade: False
last_updated: True

# Gisaid_EpiCoV metadata columns
Gisaid_EpiFlu:
isolate: "isolate"
lineage: "raw_lineage"
organism: "organism"
gisaid_accession: "gisaid_accession"
host_scientific_name: "host_scientific_name"
sample_collection_date: "sample_collection_date"
country: "geo_loc_name_country"
state_province_territory: "geo_loc_name_state_province_territory"
alias: "lineage"
purpose_of_sampling: "purpose_of_sampling_details"
purpose_of_sequencing: "purpose_of_sequencing_details"
76 changes: 76 additions & 0 deletions assets/metadata_conf/metadata_mpox.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# This file contains a high-level summary of metadata columns across different sources.
# general format of csv file is to be tab-demilimited, NO spaces in column headers: underscore (_) is used as best practice.
# gz compressed file

# Required metadata columns
Format:
sep: "tab" # options tab
suffix: ".gz"

Required:
isolate: True
lineage: True
sample_collection_date: True
country: True
state_province_territory: True

# Option metadata columns
Optional:
gene_name: False
organism: True
host_scientific_name: True
host_gender: False
host_age_bin: False
submission_date: False
gisaid_accession: False
length: False
purpose_of_sampling: False
purpose_of_sequencing: False
alias: False
Type: False
Subtype: False
SubLineageGroup: False
clade: True
last_updated: False

# Gisaid_mpox metadata columns
Pathoplexus_mpox:
isolate: "accessionversion"
lineage: "lineage"
organism: "ncbivirusname"
host_scientific_name: "hostnamescientific"
sample_collection_date: "samplecollectiondate"
country: "geoloccountry"
state_province_territory: "geoloccity"
alias: "lineage"
purpose_of_sampling: "purposeofsampling"
purpose_of_sequencing: "purposeofsequencing"
clade: "clade"

# Gisaid_EpiPox metadata columns
Gisaid_EpiPox:
isolate: "isolate"
lineage: "raw_lineage"
organism: "organism"
gisaid_accession: "gisaid_accession"
host_scientific_name: "host_scientific_name"
sample_collection_date: "sample_collection_date"
country: "geo_loc_name_country"
state_province_territory: "geo_loc_name_state_province_territory"
alias: "lineage"
purpose_of_sampling: "purpose_of_sampling_details"
purpose_of_sequencing: "purpose_of_sequencing_details"

# NCBIVirus_EpiPox metadata columns
NCBIVirus_EpiPox:
isolate: "isolate"
lineage: "raw_lineage"
organism: "organism"
gisaid_accession: "gisaid_accession"
host_scientific_name: "host_scientific_name"
sample_collection_date: "sample_collection_date"
country: "geo_loc_name_country"
state_province_territory: "geo_loc_name_state_province_territory"
alias: "lineage"
purpose_of_sampling: "purpose_of_sampling_details"
purpose_of_sequencing: "purpose_of_sequencing_details"
Loading
Loading