Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RE: Generating a salmon quantified file for generating cell type specific annotation #41

Open
tsetimothy01 opened this issue Dec 5, 2024 · 3 comments

Comments

@tsetimothy01
Copy link

Hello,

I would like to ask how you would generate salmon quantified file for generating cell type specific annotation. For example: if you have let's say WT vs. MUT, would you need to generate one single .sf file with these two conditions merge together or just use the WT? For subsetting, does it only work with .sf files are can it work with some other transcript count files? Thanks!

Best,
Timothy

@augustboyle
Copy link
Collaborator

Hello Timothy,

I have not extensively tested annotation filtering. For our publication, I would say the analogous choice would be to use the WT RNA seq. If you expect very different transcript abundances then that may not be the best choice.

You can provide any file with Name and TPM fields. For example you could run both and take the max value for WT and MT and make a new table so that all expressed transcripts would be included.

A large consideration is what you plan to compare for the whole study since it may be preferable to call all the same windows for all samples.

@tsetimothy01
Copy link
Author

Hey Evan,

I have additional questions about subsetting & it seems I'm having problems with creating an output cell type annotation file. My "quant.sf" looks like this:

image
& it seems that the transcript IDs matches that to the full annotation file
image
But when I try to create the subset with:

python /path/to/subset_gff.py -t 1 -a /path/to/annotations/gencode.v38.annotation.gff3.gz -q /path/to/annotations/salmon_merge.sf -o gencode.v38.annotation.ipsc_totalrna.gtz.gff3.gz

I didn't get any output. Any help would be useful. Thanks!

@augustboyle
Copy link
Collaborator

Hello, I don't have the .sf files I made or the full info column in your file with the transcript_id info l but as far as I can tell the names do not match: your salmon .sf file has a delimited list of transcript_id, gene_id, strand, gene name etc, whereas your annotation file likely just has the transcript_id under the transcript_id tag.

If you made one match the other, it would likely work. Or, with less work, you could edit the subset script in one line to strip out the other delimited content to match the annotation file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants