RE: Generating a salmon quantified file for generating cell type specific annotation #41

tsetimothy01 · 2024-12-05T13:49:27Z

Hello,

I would like to ask how you would generate salmon quantified file for generating cell type specific annotation. For example: if you have let's say WT vs. MUT, would you need to generate one single .sf file with these two conditions merge together or just use the WT? For subsetting, does it only work with .sf files are can it work with some other transcript count files? Thanks!

Best,
Timothy

augustboyle · 2024-12-06T04:56:21Z

Hello Timothy,

I have not extensively tested annotation filtering. For our publication, I would say the analogous choice would be to use the WT RNA seq. If you expect very different transcript abundances then that may not be the best choice.

You can provide any file with Name and TPM fields. For example you could run both and take the max value for WT and MT and make a new table so that all expressed transcripts would be included.

A large consideration is what you plan to compare for the whole study since it may be preferable to call all the same windows for all samples.

tsetimothy01 · 2024-12-12T15:15:51Z

Hey Evan,

I have additional questions about subsetting & it seems I'm having problems with creating an output cell type annotation file. My "quant.sf" looks like this:

& it seems that the transcript IDs matches that to the full annotation file

But when I try to create the subset with:

python /path/to/subset_gff.py -t 1 -a /path/to/annotations/gencode.v38.annotation.gff3.gz -q /path/to/annotations/salmon_merge.sf -o gencode.v38.annotation.ipsc_totalrna.gtz.gff3.gz

I didn't get any output. Any help would be useful. Thanks!

augustboyle · 2024-12-21T21:16:49Z

Hello, I don't have the .sf files I made or the full info column in your file with the transcript_id info l but as far as I can tell the names do not match: your salmon .sf file has a delimited list of transcript_id, gene_id, strand, gene name etc, whereas your annotation file likely just has the transcript_id under the transcript_id tag.

If you made one match the other, it would likely work. Or, with less work, you could edit the subset script in one line to strip out the other delimited content to match the annotation file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RE: Generating a salmon quantified file for generating cell type specific annotation #41

RE: Generating a salmon quantified file for generating cell type specific annotation #41

tsetimothy01 commented Dec 5, 2024

augustboyle commented Dec 6, 2024

tsetimothy01 commented Dec 12, 2024

augustboyle commented Dec 21, 2024

RE: Generating a salmon quantified file for generating cell type specific annotation #41

RE: Generating a salmon quantified file for generating cell type specific annotation #41

Comments

tsetimothy01 commented Dec 5, 2024

augustboyle commented Dec 6, 2024

tsetimothy01 commented Dec 12, 2024

augustboyle commented Dec 21, 2024