Skip to content

Python 0.5.0

Compare
Choose a tag to compare
@github-actions github-actions released this 22 Jun 15:05
· 394 commits to main since this release

Major Feature Release

Breaking Changes

  • The JSON metadata codec now interprets the empty string as an empty object. This means
    that applying a schema to an existing table will no longer necessitate modifying the
    existing rows. (@benjeffery, #2064, #2104)

  • Remove the previously deprecated as_bytes argument to TreeSequence.variants.
    If you need genotypes in byte form this can be done following the code in the
    to_macs method on line 5573 of trees.py.
    This argument was initially deprecated more than 3 years ago when the code was part of
    msprime.
    (@benjeffery, #605, #2172)

  • Arguments after ploidy in write_vcf marked as keyword only
    (@jeromekelleher, #2329, #2315).

  • When metadata equal to b'' is printed to text or HTML tables it will render as
    an empty string rather than "b''". (@hyanwong, #2349, #2351)

Changes

  • A min_time parameter in draw_svg enables the youngest node as the y axis min
    value, allowing negative times.
    (@hyanwong, #2197, #2215)

  • VcfWriter.write now prints the site ID of variants in the ID field of the
    output VCF files.
    (@roohy, #2103, #2107)

  • Make dumping of tables and tree sequences to disk a zero-copy operation.
    (@benjeffery, #2111, #2124)

  • Add copy argument to TreeSequence.variants which if False reuses the
    returned Variant object for improved performance. Defaults to True.
    (@benjeffery, #605, #2172)

  • tree.mrca now takes 2 or more arguments and gives the common ancestor of them all.
    (@savitakartik, #1340, #2121)

  • Add a edge attribute to the Mutation class that gives the ID of the
    edge that the mutation falls on.
    (@jeromekelleher, #685, #2279).

  • Add the TreeSequence.split_edges operation which inserts nodes into
    edges at a specific time.
    (@jeromekelleher, #2276, #2296).

  • Add the TreeSequence.decapitate (and closely related
    TableCollection.delete_older) operation to remove topology and mutations
    older than a give time.
    (@jeromekelleher, #2236, #2302, #2331).

  • Add the TreeSequence.individuals_time and TreeSequence.individuals_population
    methods to return arrays of per-individual times and populations, respectively.
    (@petrelharp, #1481, #2298).

  • Add the sample_mask and site_mask to write_vcf to allow parts
    of an output VCF to be omitted or marked as missing data. Also add the
    as_vcf convenience function, to return VCF as a string.
    (@jeromekelleher, #2300).

  • Add support for missing data to write_vcf, and add the isolated_as_missing
    argument. (@jeromekelleher, #2329, #447).

  • Add Tree.num_children_array and Tree.num_children. Returns the counts of
    the number of child nodes for each or a single node in the tree respectively.
    (@GertjanBisschop, #2318, #2319, #2332)

  • Add Tree.path_length.
    (@jeremyguez, #2249, #2259).

  • Add B1 tree balance index.
    (@jeremyguez, @jeromekelleher, #2251, #2281, #2346).

  • Add B2 tree balance index.
    (@jeremyguez, @jeromekelleher, #2252, #2353, #2354).

  • Add Sackin tree imbalance index.
    (@jeremyguez, @jeromekelleher, #2246, #2258).

  • Add Colless tree imbalance index.
    (@jeremyguez, @jeromekelleher, #2250, #2266, #2344).

  • Add direction argument to TreeSequence.edge_diffs, allowing iteration
    over diffs in the reverse direction. NOTE: this comes with a ~10% performance
    regression as the implementation was moved from C to Python for simplicity
    and maintainability. Please open an issue if this affects your application.
    (@jeromekelleher, @benjeffery, #2120).

  • Add Tree.edge_array and Tree.edge. Returns the edge id of the edge encoding
    the relationship of each node with its parent.
    (@GertjanBisschop, #2361, #2357)