using scUTRboot to calculate LUI #60

rhea9184 · 2022-09-30T18:49:57Z

Hi,

I have some scRNAseq data that I ran through the scUTRquant pipeline and would like to now calculate the LUI for the transcripts. Is there a pipeline or some code from the authors I might be able to use?

Thank you so much for your help!

mfansler · 2022-09-30T20:54:53Z

Hi @rhea9184, thanks for the interest!

We don't have a tutorial or pipeline for this downstream analysis yet. However, all the code used to generate the Figures for the manuscript can be found in the scUTRquant-figures repository. Figures 2 (LUI point estimates for batches), 4 (analysis of mouse data), and 5 (analysis when using custom txcutr annotations) would be the most relevant. Note that the knitted branch includes rendered HTML versions.

There is an example of using scUTRboot to do statistical testing of two clusters in the Fig. 4c file, including both LUI tests and the WD tests. For estimating LUIs, without testing, there is code in the Fig. 4ab file that performs bootstrap estimates.

Please don't hesitate to ask for any clarifications/help. I'm planning to add functionality and examples to scUTRboot in the coming weeks, so feedback is welcome!

rbarbieri86 · 2024-07-18T14:55:13Z

Hello,

Sorry to hijack an older thread, but I wanted to ask if there were any news on some tutorial for scUTRboost. The code for the figures is mostly clear and commented, however it would really help to have something basic to use as reference.

I have managed to run scUTRquant on an older dataset with a limited amount of usable cells (<3000) and manged to calculate WUI between two relatively abundant (>500) cell types. After q-value filtering I get 38 genes, however it would be nice if I could have some sort of baseline to understand if this number is tiny or not. What would be a minimum amount of cells from tissue samples for scUTRquant/scUTRboost to properly work? Having a standard procedure would really clarify a lot of these doubts.

Thank you in advance!

mfansler · 2024-07-19T07:06:57Z

Hi @rbarbieri86, Thanks for the interest! Apologies this has not been resolved after so long.

I will try to make some time for it this weekend, but in the meantime I can give some indication as to what I usually see.

Cell numbers for differential analysis. In older data - e.g., Tabula Muris - usually 50 cells was the absolute minimum for getting anything and we found that >=200 cells was needed to characterize 3'UTR usage transcriptome-wide. These numbers are also sensitive to sequencing depth, so if you have more deeply sequenced cells, then you can expect more sensitivity. For example, in processing the Perturb-seq data for the paper we could include perturbations with >=30 cells. With 500+ cells per cell type, there should be no issue with sensitivity - if something is there, then it should be detected. Only 38 genes is on the low side, so I would regard that as not much difference between your cell types.

Choosing WUI weights. Another factor is how you do the testing, and specifically how many genes you test. Similar to independent filtering done in DESeq2, it is a good idea to filter out any tests that are insufficiently powered prior to testing. This is in part what we do by providing the wt_atlas_* weights - even though we can measure many isoforms, we already know from our surveys that many isoforms never contribute more than 10% of the UMIs in a gene, so there is no point in ever testing it. Also, our lab had traditionally been less concerned with IPA changes, and so we almost always used the wt_atlas_no_ipa weights, which is the most powerful pre-defined set of weights.

Minimum cells expressed. In scUTRboot there is a parameter for the minimum number of cells that a gene must be expressed in to run a test. This is also important to set judiciously because it acts to exclude underpowered genes from being tested. We set the default minimum of 50 cells for differential testing across cell types.

rbarbieri86 · 2024-07-25T08:11:57Z

Hi Dr. Fansler,

Thank you very much for your rapid reply and apologies for my quite late one.

The info you just shared are already useful, but maybe I can share a bit more of what I have done for a clearer idea.

I have downloaded the FastQ files from this paper: https://www.ahajournals.org/doi/10.1161/CIRCRESAHA.117.312509
Ran scUTRquant and obtained the SCE files (I have also tried Seurat with the intermediate MTX files but ran into issues), then used SingleR to assign cell types. This was necessary as even after some filtering I ended up with roughly 3 times more cells thant what the processed data available for download would have in Seurat (~3k vs ~ 1k).

After that I have tried running scUTRboost WUI test between Macrophages and Monocytes, calculating the weights as below (your code if I remember correctly):

rowData(Athero_SCE_txs_qc) %<>% as_tibble() %>% group_by(gene_id) %>% mutate(utr_rank=rank(utr_position), utr_wt=(utr_rank - 1)/(max(utr_rank) - 1)) %>% ungroup() %>% as.data.frame() %>% DataFrame(row.names=.$transcript_id) %T>% { stopifnot(all(rownames(.) == rownames(Athero_SCE_txs_qc))) }

Number of bootstraps and minimum cells were set at standard (10k and 50). That is how I get the number 38.

Should I try using wt_atlas_no_ipa instead?

mfansler added the documentation Improvements or additions to documentation label Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using scUTRboot to calculate LUI #60

using scUTRboot to calculate LUI #60

rhea9184 commented Sep 30, 2022

mfansler commented Sep 30, 2022

rbarbieri86 commented Jul 18, 2024

mfansler commented Jul 19, 2024

rbarbieri86 commented Jul 25, 2024

using scUTRboot to calculate LUI #60

using scUTRboot to calculate LUI #60

Comments

rhea9184 commented Sep 30, 2022

mfansler commented Sep 30, 2022

rbarbieri86 commented Jul 18, 2024

mfansler commented Jul 19, 2024

rbarbieri86 commented Jul 25, 2024