You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was also thinking we should create a custom set of references for the ci workflow.
Here is what I am thinking:
Find a dataset with a differentially expressed gene
DE gene should be comprised of uniquely mapped reads (reads only mapping to one location). This is so we can spike-in this gene later on into a pre-computed counts matrix.
Optional: Differential expression is validated through a secondary method
Extract these uniquely mapped reads for said DE gene to create the following:
Sub-sampled fastq files for testing purposes
Custom reference files (with a custom ref.fa and genes.gtf)
The ref.fa should only contain the sequence for the gene of intereset (you can pad it with +/- 10KB), and the GTF files will have to be modified to accommodate the new ref.fa, and it should only contain our gene of interest.
Do you have some time to do look into this more?
The text was updated successfully, but these errors were encountered:
Yes, I can look into this.
My 2 cents.... do we really need to create a custom ref.fa and custom gtf file? If the uniquely aligning reads are preselected for the said gene loci. They should only align there even with the full ref.fa/GTF files. Are you thinking that using the full ref.fa/GTF is somehow going to be restrained due to the limited compute resources available for CI via github actions? If this is indeed the case, then I agree we should create stripped-down versions of the ref.fa and genes.gtf files.
Are you thinking that using the full ref.fa/GTF is somehow going to be restrained due to the limited compute resources available for CI via github actions? If this is indeed the case, then I agree we should create stripped-down versions of the ref.fa and genes.gtf files.
This will also ensure that the workflow runs in the most efficient manner too, but I see what you're saying though. This may be overkill to a certain extent. We could start off by just limiting the ref.fa and genes.gtf to the chromosome of interest.
@skchronicles said
The text was updated successfully, but these errors were encountered: