Releases: fulcrumgenomics/fgbio
Release 1.5.0
Major security bug:
- Forcing log4j transitive dependency (through GKL) to version that doesn't have zero day exploit (#747 and #751)
See CVE-2021-44228.
Updates to tools in this release:
AnnotateBamWithUmis
- Should ignore extra FASTQ records with
--sorted
(#735) - Optionally annotate UMI base qualities (#733)
- Fix a bug where molecular barcodes be truncated. This only occurs
with read structures that have either no molecular barcodes or two or more
molecular barcodes (#742). - Add support for multiple input FASTQs (#657)
- Should ignore extra FASTQ records with
PickIlluminaIndices
to choose from an existing set of candidates (#641)FastqToBam
can output UMI qualities (#740)FilterSomaticVcf
adds the end repair artifact filter (#677)
Updates to APIs in this release:
- Add better error messages for malformed input to Metric classes (#755)
- Log the last record when sorting and writing SAM/BAM (#650)
- Removed the
IterableThreadLocal
class and use the one in commons (#730) - Add queryname sorted
SamRecord
andTemplate
iterators (#516) - Allow
VcfWriter
to write to file links, devices, and named pipes (#753) - Update Intel GKL to 0.8.8 to pull in bug fixes (#676)
- Speed up property access on Cigar case class (#754)
- Skip empty lines at end of sample sheet when parsing sample data (#737)
- Updates the
commons
dependency to 1.3.0, to include a bug fix (fulcrumgenomics/commons#74)
Thank-you to existing and new contributors:
- Fulcrum Genomics:
- Tim Fennell (@tfenne)
- Nils Homer (@nh13)
- Kari Stromhaug (@kstromhaug)
- Twinstrand Biosciences:
- Clint Valentine (@clintval)
- Michael Hipp (@mjhipp)
- Thomas Smith (@ThomasHSmith)
- Outside contribtors
- Jordi (@Poshi)
And thank-you to the users!
Release 1.4.0
Important: Scala 2.12 cross-build support has been removed. (#614)
New tools in this release:
FixVcfPhaseSet
: - Add a tool to fix the VcfPhaseSet (#612)
Updates to tools in this release:
SplitBam
: Add an option to reduce memory usage if the input has many read groups (#622)EstimatePoolingFractions
: exclude sites at min coverage (#638)EstimatePoolingFractions
: useGT.AF
for per-sample allele frequencies (#637)MakeMixtureVcf
: make more tolerant of fractions that don't add up to 1 because floating point math is hard with lots of samples. (#640)GroupReadsByUmi
: optionally allows inter-contig pairs (#648)AnnotateBamWithUmis
: Support a read structure for the FASTQ (#670)TrimPrimers
: can trim only R1s (#681)CollectDuplexSeqMetrics
: type in the usage (#691)CollectDuplexSeqMetrics
: add a plot for duplex yield (#692)DemuxFastqs
: remove the erroneous mention of--sample-sheet
(#658)CorrectUmis
: add a cache (#702)DemuxFastqs
: Add an option to to insert sample barcodes in the FASTQ header (#711)- Added
--omit-fastq-read-numbers
to skip appending the trailing /1 and /2 to the output FASTQs. - Added
--include-sample-barcodes-in-fastq
to replace the last field in the first comment in the FASTQ header. - Added
--illumina-file-names
to name output FASTQs according to Illumina filename conventions - Deprecated
--illumina-standards
option in favor of the three options above - Added
--platform
option to specify the sequencing platform in the BAM read group header. Input FASTQ header must conform to Illumina standards when adding the sample barcode above
- Added
DemuxFastqs
: Add an option to filter reads on the header filter flag (#713)- Added the option
--omit-failing-reads
to only output reads marked as passing in the FASTQ header comments. replaced with N's. DemuxFastqs
: Adding option to filter on the internal control flag, and accompanying tests (#714)- Added
--omit-control-reads
to omit any reads marked as control in the FASTQ read header comment.
- Added
DemuxFastqs
: Add an option to mask bases below a specified quality threshold (#716)- Added
--quality-threshold
to specify a threshold to use for masking bases. Bases with a quality score below the threshold are
- Added
ErrorRateByReadPosition
: Improve error message when no reference fasta.dict
is provided (#728)DemuxFastqs
: Add metrics on base quality to the sample barcode metrics output (#720)AnnotateBamWithUmis
: Option to indicated sorted FASTQ to add UMIs more quickly (#729)
Updates to APIs in this release:
- Updates to make the VCF api code considerably faster when reading VCFs with may samples (#609)
- Have Metric classes correctly serialize
EnumEntry
fields to string (#601) - Add a brief description to
AssignPrimersMetric
(#616) - Support assembling JAR files with Java 11 (#645)
SampleSheet
checks ID unique between samples with/without Lane (#684)- Log the last progress in
Bams.queryGroupedIterator
(#700) - Validate that a Variant and its Genotypes have the same alleles (#703)
- Add "biotype" to
Gene
and updateNcbiRefSeqParser
to support more gene biotypes (#706) - Updates how NCBI RefSeq GFFs are parsed to enable parsing of genes that do not have canonical transcript entries below them (#706).
- Add methods to make a Variant locatable (#699)
- GenomicRange to support contig names with colons (#708)
- Add helpers for mateCigar and matesOverlap on SamRecord (#717)
- Resolve bug where empty string fields in Metric files would yield ':none:' values in the case class (#724).
- Unify and add caching to the way Metric class names are accessed (#724)
- Adding one more gene biotype for
SRP_RNA
. (#726)
Release 1.3.0
Important: This is the last version of fgbio that supports scala 2.12. This only affects developers who use fgbio in their projects (not end-users of the toolkit running tools). Moving forward fgbio will support scala 2.13 only.
New tools in this release:
AssignPrimers
: takes a BAM file and a file of primer metadata and adds auxiliary tags to the BAM file to identify which primers likely generated which inserts/reads.
Updates to tools in this release:
UpdateGffContigNames
: fix for bug that caused generation of misformatted GFFs (#591)ErrorRateByReadPosition
: option to not collapse substitution types (e.g. report A>C and T>G separately) (#608)UpdateFastaContigNames
: (i) option to sort output FASTA, and (ii) option to add in missing contigs from a second FASTA file (#590)UpdateDelimitedFileContigNames
: option to sort output file (#598)
Updates to APIs in this release:
- Updated
GenomicRange
to handle point positions (e.g.chr1:123
) and also addGenomicRange.apply()
so it can be used as a command line argument
Release 1.2.0
Release 1.2.0 is a minor feature release.
This release adds tools to reformat FASTAs/GFFs based on alternate names (#467, #584, #585, #586)
CollectAlternateContigNames
: Collates the alternate contig names from an NCBI assembly report.UpdateFastaContigNames
: Updates the sequence names in a FASTA.UpdateGffContigNames
: Updates the contig names in a GFF.UpdateIntervalListContigNames
: Updates the sequence names in an Interval List file.UpdateVcfContigNames
: Updates the contig names in a VCF.UpdateDelimitedFileContigNames
: Tool for updating contig names in a delimited data file)
The following API changes were also introduced:
- add another date format Illumina uses in the RunInfo.xml (#555)
- add support for Iso8610 dates in RunInfo.xml as Illumina has started using that now. (#582)
- add the
primary
keyword for accessing a SamRecord's secondary flag (#560) - fix a bug to allow setting the primary flag on SamRecord (#562)
- Two fixes to RefFlatSource (#564):
- Exons were not being put into transcripts in transcription order which is required (but not verified in Transcript)
- Gene start/end were being taken as the min of the transcript starts/ends, but for end it should be max
- Changed the gene annotation case classes and the RefFlatSource to resolve two issues (#568):
- RefFlatSource would drop transcript mappings if there were mappings to > 1 chrom or > 1 strand for a given gene
- RefFlatSource would combine transcript mappings at wildly different locations on the same chrom/strand
- Added a source class for parsing and reading gene annoations from an NCBI RefSeq GFF file (#573)
- Add test to Metric for reading and writing chars (#574)
- Remove system utility code ported to commons for the fgbio CLI (#576)
- Migrated gzip support to commons (#575)
- API for sequence dictionaries (#581)
Release 1.1.0
Release 1.1.0 is a minor feature release with the following updates:
- Add support for single-end reads in the consensus building tool chain (i.e.
GroupReadsByUmi
andCallMolecularConsensusReads
) - Do not try to automatically index BAM files with HTSJDK when a long reference sequence is present in the sequence dictionary
* Fix has collision problem that could cause sorting in RandomQuery order to do the wrong thing - Change to avoid using the Intel inflater/delfater on Mac OS X due to a bug in the Mac implementation
- Scala API for reading and writing Picard's IntervalList files
- Fix bug in the calculation of the consensus UMI bases during duplex consensus calling
- Various updates to
SamBuilder
- Minor updates to VCF API
Release 1.0.0
Major feature release with the following changes:
Major Changes
- Cross-building support moved from
[2.11, 2.12]
->[2.12, 2.13]
- Support added for the high-performance Intel Inflator and Deflator for working with gzipped data
- Significant performance improvements to
CallDuplexConsensusReads
and the addition of multi-threaded calling - A new 100% scala API for reading, writing and working with VCF files
Minor Changes
- Broken pipes while writing to stdout/stderr will print a concise error instead of a long stack trace
- Common option to
fgbio.jar
to set validation stringency when reading/writing SAM/BAM - Minor fixes to HapCutToVcf
- UmiConsensusCaller and related tools now merge
platform
values in read groups case-insensitively
Release 0.8.1
Minor point release with a single new tool to sort FASTQ files by read name and number.
Release 0.8.0
Major release with the following changes:
- Major improvements to the pairwise Aligner class:
- Significant performance improvements in the Aligner class for pairwise alignments
- When aligning DNA sequences aligner will produce matches in CIGAR for matches between compatible IUPAC codes (e.g. R paired with A or G)
- New method to produce all alignments above a score threshold from a pair of sequences
- New interface to allow for custom gap scoring
- Added
Sequences.revcomp()
function that correctly reverse complements all IUPAC DNA/RNA codes - Added method to
Metric
class to return anIterator
over a metrics file instead of reading the whole file into memory Io
object now automatically supports bgzipped files with.bgz
or.bgzip
extensions- Fixed bug in
SamReader
that would occasionally cause exceptions with overlapping query regions - Updated to latest scala point version to create classes/JARs compatible with JDK 9 and 10 at runtime
- Added method to
ExtractBasecallingParamsForPicard
to enable easy access to unmatched BAM file path
Release 0.7.0
Release 0.7.0 introduces the following changes to existing tools:
- GroupReadsByUmi
- DemuxFastqs: enable
--quality-encoding
to be used on the command line (#417) - HapCutToVcf
In addition, the following new tools were added:
- FindSwitchbackReads: Tool to detect templates with strand-switch events in them (#438)
The following API changes were also introduced:
FastqSource
can handle read numbers > 2 (#408)- Fixed writing and parsing of
Double.Nan
,Double.PositiveInfinity
andDouble.NegativeInfinity
inMetric
classes (#411) SamBuilder
should accept missing bases and quals with a cigar (#424)- Add message to
require()
call inSample
(#425) ReadStructure
to allow and strip out whitespace within the read structure during parsing (#425)ProgressLogger.record
should return if logging was triggered and a method to log the last record (#421)- Bug fix:
Metric.write
was not closing its writer (#421) - Adding a few useful methods to
Sequences
(#421) Metric
now extendsCommons
Writer
so we can useAsyncWriter
on it (#437)- Improve the error message when validating a sample shee. (#412)
Release 0.6.1
Bug fix release which resolves a problem introduced in a dependency that caused fgbio to be unable to read BAM files from stdin or named pipes. All users of 0.6.0 should upgrade to 0.6.1.