Skip to content

Release 0.2.0

Compare
Choose a tag to compare
@nh13 nh13 released this 22 Jun 21:11

Release 0.2.0 introduces the following changes to existing tools:

  • added global arguments accessible to all tools, which are given as arguments prior to the tool name:
    • --tmp-dir: directory to use for temporary files.
    • --compression: default GZIP compression level, BAM compression level.
    • --async-io: use asynchronous I/O where possible, e.g. for SAM and BAM files.
  • numerous changes to the tool documentation to support output in MarkDown format.
  • DuplexConsensusCaller:
    • adding logging statistics for DuplexConsensusCaller.
    • adding quality trimming.
    • improved method to find the set of "compatible" cigars to filter which reads from which to build a consensus
  • DemuxFastqs:
    • the output directory should be created if it does not exist
    • change to the new quality format detector caused the detected encoding
      not to be printed
  • ClipOverlappingReads is deprecated in favor of ClipBam.
  • SampleSheet and ExtractBasecallingParamsForPicard
    • if the library identifier (Library_Id column) does not exist, it will default to the sample identifier (Sample_d column); previously it defaulted to the sample name (Sample_Name column).
  • HapCutToVcf: updated to support updated HapCut2 outputs.
    • the full FORMAT field in the VCF is printed, including trailing missing values.

In addition, the following new tools were added:

  • FastqToBam: generates an unmapped BAM (or SAM or CRAM) file from fastq files.
  • BuildToolDocs: generates the suite of per-tool MarkDown documents.
  • SplitBam: splits a BAM into multiple BAMs, one per-read group (or library).
  • ClipBam: clips reads from the same template; replaces ClipOverlappingReads.
  • CollectDuplexSeqMetrics: generates metrics for duplex sequencing quality control.

Next, a new API for reading and writing SAM/BAM files built for scala idioms:

  • SamRecord: a replacement for htsjdk's SAMRecord with more scala-esque fields and methods.
  • SamSource: a class for reading SAM/BAM/CRAM files and for querying them.
  • SamWriter: a class for writing SAM/BAM/CRAM files and sorting them.
  • SamOrder: a trait for specifying SAM/BAM orderings; in addition to coordinate and queryname sort orders, includes useful and novel sorts such as:
    • random: generates a random order over all the reads.
    • randomquery: generates a random order with queryname grouping.
    • templatecoordinate: the sort order used by GroupReadByUmi; sorts reads by the earlier unclipped 5' coordinate of the read pair, followed by the higher unclipped 5' coordinate of the read pair.
    • unsorted: the official "unsorted" ordering.
    • unknown: he official "unknown" ordering.
  • Bams: methods for manipulating sequences of SamRecords and other useful utility methods.
    • contains sorting methods that have better disk-backed sorting than htsjdk's for faster sorting of SAM/BAM files.
  • SamBuilder: a class for building SAM/BAM files and records; useful for generating test-cases for unit tests.

Finally the following other changes were made:

  • support for scala 2.12.2; we use this version by default.
  • support for cross-building and publishing of scala 2.11.8 and 2.12.2
  • uses 0.2.0 release of sopt and commons.