Skip to content

Releases: fulcrumgenomics/fgbio

Release 1.5.0

11 Jan 07:25
Compare
Choose a tag to compare

Major security bug:

  • Forcing log4j transitive dependency (through GKL) to version that doesn't have zero day exploit (#747 and #751)
    See CVE-2021-44228.

Updates to tools in this release:

  • AnnotateBamWithUmis
    • Should ignore extra FASTQ records with --sorted (#735)
    • Optionally annotate UMI base qualities (#733)
    • Fix a bug where molecular barcodes be truncated. This only occurs
      with read structures that have either no molecular barcodes or two or more
      molecular barcodes (#742).
    • Add support for multiple input FASTQs (#657)
  • PickIlluminaIndices to choose from an existing set of candidates (#641)
  • FastqToBam can output UMI qualities (#740)
  • FilterSomaticVcf adds the end repair artifact filter (#677)

Updates to APIs in this release:

  • Add better error messages for malformed input to Metric classes (#755)
  • Log the last record when sorting and writing SAM/BAM (#650)
  • Removed the IterableThreadLocal class and use the one in commons (#730)
  • Add queryname sorted SamRecord and Template iterators (#516)
  • Allow VcfWriter to write to file links, devices, and named pipes (#753)
  • Update Intel GKL to 0.8.8 to pull in bug fixes (#676)
  • Speed up property access on Cigar case class (#754)
  • Skip empty lines at end of sample sheet when parsing sample data (#737)
  • Updates the commons dependency to 1.3.0, to include a bug fix (fulcrumgenomics/commons#74)

Thank-you to existing and new contributors:

And thank-you to the users!

Release 1.4.0

19 Oct 20:32
Compare
Choose a tag to compare

Important: Scala 2.12 cross-build support has been removed. (#614)

New tools in this release:

  • FixVcfPhaseSet: - Add a tool to fix the VcfPhaseSet (#612)

Updates to tools in this release:

  • SplitBam: Add an option to reduce memory usage if the input has many read groups (#622)
  • EstimatePoolingFractions: exclude sites at min coverage (#638)
  • EstimatePoolingFractions: use GT.AF for per-sample allele frequencies (#637)
  • MakeMixtureVcf: make more tolerant of fractions that don't add up to 1 because floating point math is hard with lots of samples. (#640)
  • GroupReadsByUmi: optionally allows inter-contig pairs (#648)
  • AnnotateBamWithUmis: Support a read structure for the FASTQ (#670)
  • TrimPrimers: can trim only R1s (#681)
  • CollectDuplexSeqMetrics: type in the usage (#691)
  • CollectDuplexSeqMetrics: add a plot for duplex yield (#692)
  • DemuxFastqs: remove the erroneous mention of --sample-sheet (#658)
  • CorrectUmis: add a cache (#702)
  • DemuxFastqs: Add an option to to insert sample barcodes in the FASTQ header (#711)
    • Added --omit-fastq-read-numbers to skip appending the trailing /1 and /2 to the output FASTQs.
    • Added --include-sample-barcodes-in-fastq to replace the last field in the first comment in the FASTQ header.
    • Added --illumina-file-names to name output FASTQs according to Illumina filename conventions
    • Deprecated --illumina-standards option in favor of the three options above
    • Added --platform option to specify the sequencing platform in the BAM read group header. Input FASTQ header must conform to Illumina standards when adding the sample barcode above
  • DemuxFastqs: Add an option to filter reads on the header filter flag (#713)
  • Added the option --omit-failing-reads to only output reads marked as passing in the FASTQ header comments. replaced with N's.
  • DemuxFastqs: Adding option to filter on the internal control flag, and accompanying tests (#714)
    • Added --omit-control-reads to omit any reads marked as control in the FASTQ read header comment.
  • DemuxFastqs: Add an option to mask bases below a specified quality threshold (#716)
    • Added --quality-threshold to specify a threshold to use for masking bases. Bases with a quality score below the threshold are
  • ErrorRateByReadPosition: Improve error message when no reference fasta .dict is provided (#728)
  • DemuxFastqs: Add metrics on base quality to the sample barcode metrics output (#720)
  • AnnotateBamWithUmis: Option to indicated sorted FASTQ to add UMIs more quickly (#729)

Updates to APIs in this release:

  • Updates to make the VCF api code considerably faster when reading VCFs with may samples (#609)
  • Have Metric classes correctly serialize EnumEntry fields to string (#601)
  • Add a brief description to AssignPrimersMetric (#616)
  • Support assembling JAR files with Java 11 (#645)
  • SampleSheet checks ID unique between samples with/without Lane (#684)
  • Log the last progress in Bams.queryGroupedIterator (#700)
  • Validate that a Variant and its Genotypes have the same alleles (#703)
  • Add "biotype" to Gene and update NcbiRefSeqParser to support more gene biotypes (#706)
  • Updates how NCBI RefSeq GFFs are parsed to enable parsing of genes that do not have canonical transcript entries below them (#706).
  • Add methods to make a Variant locatable (#699)
  • GenomicRange to support contig names with colons (#708)
  • Add helpers for mateCigar and matesOverlap on SamRecord (#717)
  • Resolve bug where empty string fields in Metric files would yield ':none:' values in the case class (#724).
  • Unify and add caching to the way Metric class names are accessed (#724)
  • Adding one more gene biotype for SRP_RNA. (#726)

Release 1.3.0

25 Aug 12:23
Compare
Choose a tag to compare

Important: This is the last version of fgbio that supports scala 2.12. This only affects developers who use fgbio in their projects (not end-users of the toolkit running tools). Moving forward fgbio will support scala 2.13 only.

New tools in this release:

  • AssignPrimers: takes a BAM file and a file of primer metadata and adds auxiliary tags to the BAM file to identify which primers likely generated which inserts/reads.

Updates to tools in this release:

  • UpdateGffContigNames: fix for bug that caused generation of misformatted GFFs (#591)
  • ErrorRateByReadPosition: option to not collapse substitution types (e.g. report A>C and T>G separately) (#608)
  • UpdateFastaContigNames: (i) option to sort output FASTA, and (ii) option to add in missing contigs from a second FASTA file (#590)
  • UpdateDelimitedFileContigNames: option to sort output file (#598)

Updates to APIs in this release:

  • Updated GenomicRange to handle point positions (e.g. chr1:123) and also add GenomicRange.apply() so it can be used as a command line argument

Release 1.2.0

04 Jun 08:09
Compare
Choose a tag to compare

Release 1.2.0 is a minor feature release.

This release adds tools to reformat FASTAs/GFFs based on alternate names (#467, #584, #585, #586)

  • CollectAlternateContigNames: Collates the alternate contig names from an NCBI assembly report.
  • UpdateFastaContigNames: Updates the sequence names in a FASTA.
  • UpdateGffContigNames: Updates the contig names in a GFF.
  • UpdateIntervalListContigNames: Updates the sequence names in an Interval List file.
  • UpdateVcfContigNames: Updates the contig names in a VCF.
  • UpdateDelimitedFileContigNames: Tool for updating contig names in a delimited data file)

The following API changes were also introduced:

  • add another date format Illumina uses in the RunInfo.xml (#555)
  • add support for Iso8610 dates in RunInfo.xml as Illumina has started using that now. (#582)
  • add the primary keyword for accessing a SamRecord's secondary flag (#560)
  • fix a bug to allow setting the primary flag on SamRecord (#562)
  • Two fixes to RefFlatSource (#564):
    • Exons were not being put into transcripts in transcription order which is required (but not verified in Transcript)
    • Gene start/end were being taken as the min of the transcript starts/ends, but for end it should be max
  • Changed the gene annotation case classes and the RefFlatSource to resolve two issues (#568):
    1. RefFlatSource would drop transcript mappings if there were mappings to > 1 chrom or > 1 strand for a given gene
    2. RefFlatSource would combine transcript mappings at wildly different locations on the same chrom/strand
  • Added a source class for parsing and reading gene annoations from an NCBI RefSeq GFF file (#573)
  • Add test to Metric for reading and writing chars (#574)
  • Remove system utility code ported to commons for the fgbio CLI (#576)
  • Migrated gzip support to commons (#575)
  • API for sequence dictionaries (#581)

Release 1.1.0

07 Nov 23:44
Compare
Choose a tag to compare

Release 1.1.0 is a minor feature release with the following updates:

  • Add support for single-end reads in the consensus building tool chain (i.e. GroupReadsByUmi and CallMolecularConsensusReads)
  • Do not try to automatically index BAM files with HTSJDK when a long reference sequence is present in the sequence dictionary
    * Fix has collision problem that could cause sorting in RandomQuery order to do the wrong thing
  • Change to avoid using the Intel inflater/delfater on Mac OS X due to a bug in the Mac implementation
  • Scala API for reading and writing Picard's IntervalList files
  • Fix bug in the calculation of the consensus UMI bases during duplex consensus calling
  • Various updates to SamBuilder
  • Minor updates to VCF API

Release 1.0.0

06 Aug 19:56
Compare
Choose a tag to compare

Major feature release with the following changes:

Major Changes

  • Cross-building support moved from [2.11, 2.12] -> [2.12, 2.13]
  • Support added for the high-performance Intel Inflator and Deflator for working with gzipped data
  • Significant performance improvements to CallDuplexConsensusReads and the addition of multi-threaded calling
  • A new 100% scala API for reading, writing and working with VCF files

Minor Changes

  • Broken pipes while writing to stdout/stderr will print a concise error instead of a long stack trace
  • Common option to fgbio.jar to set validation stringency when reading/writing SAM/BAM
  • Minor fixes to HapCutToVcf
  • UmiConsensusCaller and related tools now merge platform values in read groups case-insensitively

Release 0.8.1

29 Mar 22:28
Compare
Choose a tag to compare

Minor point release with a single new tool to sort FASTQ files by read name and number.

Release 0.8.0

14 Feb 00:14
Compare
Choose a tag to compare

Major release with the following changes:

  • Major improvements to the pairwise Aligner class:
    • Significant performance improvements in the Aligner class for pairwise alignments
    • When aligning DNA sequences aligner will produce matches in CIGAR for matches between compatible IUPAC codes (e.g. R paired with A or G)
    • New method to produce all alignments above a score threshold from a pair of sequences
    • New interface to allow for custom gap scoring
  • Added Sequences.revcomp() function that correctly reverse complements all IUPAC DNA/RNA codes
  • Added method to Metric class to return an Iterator over a metrics file instead of reading the whole file into memory
  • Io object now automatically supports bgzipped files with .bgz or .bgzip extensions
  • Fixed bug in SamReader that would occasionally cause exceptions with overlapping query regions
  • Updated to latest scala point version to create classes/JARs compatible with JDK 9 and 10 at runtime
  • Added method to ExtractBasecallingParamsForPicard to enable easy access to unmatched BAM file path

Release 0.7.0

06 Nov 20:43
Compare
Choose a tag to compare

Release 0.7.0 introduces the following changes to existing tools:

  • GroupReadsByUmi
    • check that the raw UMI tag is found foreach read (#406)
    • Fix log message in GroupReadsByUmi to be more accurate / less misleading (#436)
  • DemuxFastqs: enable --quality-encoding to be used on the command line (#417)
  • HapCutToVcf
    • fix ambiguous (IUPAC) reference bases on the fly #418)
    • add an option to skip indexing the output file (ex. when the input does not have CONTIG lines) #418)

In addition, the following new tools were added:

  • FindSwitchbackReads: Tool to detect templates with strand-switch events in them (#438)

The following API changes were also introduced:

  • FastqSource can handle read numbers > 2 (#408)
  • Fixed writing and parsing of Double.Nan, Double.PositiveInfinity and Double.NegativeInfinity in Metric classes (#411)
  • SamBuilder should accept missing bases and quals with a cigar (#424)
  • Add message to require() call in Sample (#425)
  • ReadStructure to allow and strip out whitespace within the read structure during parsing (#425)
  • ProgressLogger.record should return if logging was triggered and a method to log the last record (#421)
  • Bug fix: Metric.write was not closing its writer (#421)
  • Adding a few useful methods to Sequences (#421)
  • Metric now extends Commons Writer so we can use AsyncWriter on it (#437)
  • Improve the error message when validating a sample shee. (#412)

Release 0.6.1

18 May 16:43
Compare
Choose a tag to compare

Bug fix release which resolves a problem introduced in a dependency that caused fgbio to be unable to read BAM files from stdin or named pipes. All users of 0.6.0 should upgrade to 0.6.1.