-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in silico deletion script: hound_isd_bed #43
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks like it should work! Just a few comments to improve.
scores_h5.create_dataset("seqs", dtype="bool", shape=(num_seqs, options.mut_len, 4)) | ||
for snp_stat in options.snp_stats: | ||
scores_h5.create_dataset( | ||
snp_stat, dtype="float16", shape=(num_seqs, options.mut_len, 4, num_targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the 3rd axis disappear? In ISM, it represents the alternative nucleotides, but they don't exist here.
ref_preds_stitch, alt_preds, options.snp_stats, None | ||
) | ||
for snp_stat in options.snp_stats: | ||
scores_h5[snp_stat][si, mi - mut_start, 0] = ism_scores[snp_stat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By dropping the 3rd axis, you can remove the "0]" here.
ref_preds.append(ref_preds_shift) | ||
|
||
# for mutation positions | ||
for mi in range(mut_start, mut_end): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the deletion size is >1, I think you'd want to advance your index by the size. Otherwise, your deleting overlapping k-mers, and I can't think of a scenario where you'd prefer that over the single nt deletions.
) | ||
parser.add_option( | ||
"-s", | ||
dest="del_len", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"del_size" maybe so -s matches the first letter?
Description of your changes
Added new script hound_isd_bed.py analogous to the hound_ism_bed.py. The ISD script performs in silico deletions instead of in silico mutations.
Stitching is performed on reference to avoid doubling of the deleted sequence portion in the left and right shifts in alternative.
New arguments: "-s", dest="del_len" (Deletion size for ISD [Default: 1])
Type of change
(If applicable) How has this been tested?
Tested on the MPRA-deletion dataset (M Kircher, Nat Comm 2019) -- F9, HBG1 and LDLR gene promoters