Phase error #1876
Replies: 2 comments 1 reply
-
Hi @AliPearson 👋 Here's a basic example of how to introduce phase errors into the output haplotypes: import numpy as np
import msprime
ts = msprime.sim_ancestry(10, random_seed=32, sequence_length=100)
ts = msprime.sim_mutations(ts, rate=0.1)
# https://stackoverflow.com/questions/5389507/
def pairwise(iterable):
a = iter(iterable)
return zip(a, a)
site_position = ts.tables.sites.position
rng = np.random.default_rng(1234)
for h1, h2 in pairwise(ts.haplotypes()):
# Choose a breakpoint uniformly along the genome.
# NOTE!! This probably isn't a good model and I haven't thought through
# the details about whether searchsorted is doing the right thing in
# terms of being to the left or right of the breakpoint. Use at your
# own risk!
x = rng.random() * ts.sequence_length
k = np.searchsorted(site_position, x)
print(x, k, site_position[k])
print("H1:", h1)
print("H2:", h2)
unphased1 = h1[:k] + h2[k:]
unphased2 = h2[:k] + h1[k:]
print("U1:", unphased1)
print("U2:", unphased2)
print() This pairs up the sample nodes two-by-two, simulates a phase-switch breakpoint and then swaps over the haplotypes. Unfortunately there's no support for getting this output to something like VCF. Does this help? |
Beta Was this translation helpful? Give feedback.
-
I was imagining taking the tables and adding a new set of individuals that inherited from the sample individuals, but with switch error, and making these the samples. Like:
This is awkward because of (1). I think modifying the output might be the way to go? |
Beta Was this translation helpful? Give feedback.
-
Hi msprime community,
I am interested in simulating phase error at a rate of about 1 error per centimorgan. I have been simulating tree sequences of 200 Mbp between approx. 1000 samples and I want phase error on one individual. I am struggling to know how to modify the output tables to achieve? Can this be simulated by msprime?
Thanks,
Alice
Beta Was this translation helpful? Give feedback.
All reactions