-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gconcepcion/yak changes #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that alignments are taking less than 24h now, we can probably reconsider forcing GCP to use on-demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameter_meta
needs an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first FASTA in the array is abnormally small, this could result in requesting too little disk space. We've been converting these to something more like:
Int disk_size = ceil(size(reads_fastas "GB") * 4 + 20)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int disk_size = ceil((size(query_sequences, "GB") + size(reference, "GB")) * 2 + 20)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-@ 3
memory: threads * 8 + " GB"
Or something like this. Maybe define mem_gb in inputs depending on threads, and use memory: mem_gb + " GB"
"data_index": align_hifiasm.asm_bam_index | ||
} | ||
|
||
Pair[ReferenceData,IndexData] align_data = (ref, sample_aligned_bam) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested in seeing what this looks like in outputs.json.
- updated parameter_meta - updated inputs.json - cleaned up some whitespace - added comments - using fasta filesize to estimate depth rather than a separate task; based on Greg's experiments, an uncompressed 10x FASTA is ~60GB
We'll need to update the README as well. |
Use FASTA file size to estimate depth for yak count parameters.
…of aligned bam outputs
Add settings for yak - The idea is that singleton kmers are more likely to be errors. Therefore use bloom filter (-b37) when we have a alot of data; and no bloom when either parent has low coverage.
Also added support for optional alignment to multiple references