Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gconcepcion/yak changes #9

Merged
merged 18 commits into from
Oct 26, 2023
Merged

Gconcepcion/yak changes #9

merged 18 commits into from
Oct 26, 2023

Conversation

gconcepcion
Copy link
Contributor

Add settings for yak - The idea is that singleton kmers are more likely to be errors. Therefore use bloom filter (-b37) when we have a alot of data; and no bloom when either parent has low coverage.

Also added support for optional alignment to multiple references

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that alignments are taking less than 24h now, we can probably reconsider forcing GCP to use on-demand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter_meta needs an update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/PacificBiosciences/wdl-humanassembly/blob/6c0232749f49dc4451c67a1911b2b5166e958b51/workflows/assemble_genome/assemble_genome.wdl#L101

If the first FASTA in the array is abnormally small, this could result in requesting too little disk space. We've been converting these to something more like:

Int disk_size = ceil(size(reads_fastas "GB") * 4 + 20)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"data_index": align_hifiasm.asm_bam_index
}

Pair[ReferenceData,IndexData] align_data = (ref, sample_aligned_bam)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested in seeing what this looks like in outputs.json.

williamrowell and others added 6 commits September 28, 2023 11:10
- updated parameter_meta
- updated inputs.json
- cleaned up some whitespace
- added comments
- using fasta filesize to estimate depth rather than a separate task; based on Greg's experiments, an uncompressed 10x FASTA is ~60GB
@williamrowell
Copy link
Collaborator

We'll need to update the README as well.

@gconcepcion gconcepcion merged commit de41388 into develop Oct 26, 2023
1 check passed
@williamrowell williamrowell deleted the gconcepcion/yak-changes branch October 27, 2023 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants