-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom 3'utr #72
Comments
Thanks for the interest! Yes, you could build a custom target that only quantifies reads in the regions of interest. To plug into this pipeline, you would indeed provide a kallisto index ("kdx"), GTF, and TSV merge annotation. You would edit the custom_utrs:
path: "extdata/targets/custom_utrs/"
genome: "hg38"
gtf: "custom_utrs.gtf"
kdx: "custom_utrs.kdx"
merge_tsv: "custom_utrs.merge.tsv"
tx_annots: null
gene_annots: null
download_script: null and the CaveatsI'll just note some caveats about taking this approach as opposed to adding the custom 3'UTRs to the full annotation. Identifying Cells: Valid cell barcodes would need to come from previous data processing. Otherwise, the targeted regions alone may not be sufficient to discriminate high-quality cells from low-quality cells or background. Comparing Across Cells or Samples: Normalization (size) factors would need to come from previous data processing. With only targeted regions, it would be unclear whether higher counts were due to higher expression, higher capture rate, deeper sequencing, or some mixture. Multimapping Reads: Reads that would multimap in a full annotation might uniquely map in a targeted subset, leading to overestimation of counts. One should prove this isn't a factor before trusting the targeted results. You'd probably want to prepare a full index (full UTRome + custom novel 3'UTRs) and then inspect if any of the kmers from the targeted regions are shared with those in non-targeted regions. If they do, you may need to include the other transcripts that have shared k-mers to make the assignment fair. That is, one doesn't want changes in gene expression from some other gene to show up as isoform-specific expression in a targeted isoform due to excluding the alternative loci whence the reads may have originated. On the last point, you may also just do some empirical spot checks. For example, run some samples with the full UTRome + novel 3'UTRs and separately with just the targeted index, then compare the results. That should surface multimapping issues if the counts do not come out identically. Hope that's helpful! Let me know if I can answer any more questions. |
Hello,
I have a set of custom novel 3'UTRs that I would like to quantify in single-cell data.
Ideally I would just want to quantify the 100 or so 3'UTRs that I'm interested in for speed's sake
What would I need to build a minimum working kallisto index of UTRome, GTF, and TSV merge annotation for my custom set of 3'UTRs?
thank you in advance!
The text was updated successfully, but these errors were encountered: