Releases: molgenis/NGS_DNA
- removed arguments from CollectHsMetrics that were causing problems with the
- replacing missing backslash before a quote in some header lines of a vcf (
- searching in the wrong column for "MEAN_TARGET_COVERAGE" in the
- searching in a different folder for the qc files in
- missing bedfile annotation in the header of a vcf ( &
- ConcordanceCheck SNPs
- Gavin standalone script &
- added new protocols
- PreparedDragenData
- CombineDragenSampleData
- ReheaderVcf
- new workflow_DRAGEN
- new generate_DragenScripts
From this version on the folders Samplesheets
, tmp
, project
and generatedscripts
have a subfolder with the name of the pipeline (e.g. NGS_DNA, NGS_Demultiplexing,VIP, GAP etc). This has been implemented due to the fact that one machine does both the rawdata handling (AGCT/NGS_Demultiplexing) and the variantcalling (GAP/NGS_DNA). Data archiving is still on a seperate machine (chaperone)
Getting rawdata from permanent storage changed
Diagnostic clusters cannot talk directly to permanent storage anymore (i.e. permanent storage can push only).
- Created new protocol: CheckRawDataOnTmp
this protocol comes instead of CopyPrmToTmpData, it will check whether the rawdata is already available, if not than a file will be created in thelogs
folder as${logsDir}/${project}/${project}.data.requested
, this filetype will be checked by a new NGS_Automated protocol (copyRawDataToTmp) and will push the data to the diagnostic cluster
coverage calculations per gVCF (instead of per BAM)
This script will use gVCF files as input, due to all kind of possible rearrangements of the reads during the variantcalling the bam files created by the pipeline are not always the end product. Therefore it is more accurate to calculate coverage based on the variantcaller created gVCF files.
End product is not changed, still in this format:
"Index\tChr\tChr Position Start\tChr Position End\tAverage Counts\tDescription\tReference Length\tCDS\tContig"
But additional file(s) are created to calculate percentages/bases with zero coverage, dp<20, GQ values (.{bedfile}.CoverageOutput.csv
Diploid chrX
- Male chromosome X will be called diploid
This means that there is no need to have 2 seperate chr X batches (nonPar and Par))
updated tool versions:
bcfTools 1.16
BEDTools 2.30.0
gVCF2BED 1.0.0 (new tool)
HTSlib 1.16
ngs-utils 23.04.1
picard 2.26.0
Python 3.10.4
R 4.2.1
SAMTools 1.16.1
vcfAnno v0.3.3
- moved unused protocols to deprecated and removed them from the workflow.csv
NGS_DNA 3.7.0 (Zebra)
ploidy is 2 for male chromosome X (this to detect possible mosaic variants)
NGS_DNA 4.1.0 (Winged-Helix)
Important changes:
- Male chromosome X will be called diploid
- This means that there is no need to have 2 seperate chr X batches (nonPar and Par))
- Added coverage calculations based on gVCF instead of bam.
- new protocol CoverageCalculations_gvcf is added (in the ngs-utils 22.01.1 release is also a standalone version available)
Other changes:
- Arguments for GATK and Picard updated ('=' sign is not allowed anymore)
- shellcheck warning free
- updated tool versions:
- bcfTools 1.14
- BEDTools 2.30.0
- gVCF2BED 1.0.0 (new tool)
- HTSlib 1.14
- ngs-utils 22.01.1
- picard 2.26.0
- Python 3.10.2
- R 4.0.3
- SAMTools 1.14
- vcfAnno v0.3.3
- moved unused protocols to deprecated and removed them from the workflow.csv
- vcfToTable
- CoverageCalculations
- Convading
- DecisionTree
4.0.5 (updated version for Winged Helix)
- all parameters have been updated to the correct toolchain/javasuffix
- added parameters_wingedhelix.csv file
removed the requirement of a group parameters config file in the pipeline --> self assigned based on which group the generate_template is running in.
removal of unused workflows/files
NGS_DNA 3.6.0 (X-Ray tetra)
- gender specific result folders for coverage calculations (results/coverageCoveragePer{Base,Target}/{male,female,unknown})
- small fixes for wgs data
- bugfixes in curl response for track and trace database
- gnomAD 2.1.1 (was 2.0.2) (some fields had a name change)
NGS_DNA 4.0.3 (GATK4 pipeline)
GATK 4 pipeline
NGS_DNA 3.5.6 (Wildebeest)
- number of rows in samplesheet > 200 is not producing an error anymore
- XT-HS bugfix in CoverageCalculations
4.0.1 Pilot
Merge pull request #261 from TDMedina/release_version WIP updates to the WES pipeline in preparation for future Gearshift deployment.