Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the best threshold for FDR value #6

Open
ademcan opened this issue Feb 14, 2017 · 3 comments
Open

What is the best threshold for FDR value #6

ademcan opened this issue Feb 14, 2017 · 3 comments

Comments

@ademcan
Copy link

ademcan commented Feb 14, 2017

Hi Owen,
I was surprised with some of my recent TaDa-seq data analysis. I used the polii.gene.call script to identify "expressed" genes in a Dam-Pol2 experiment but didn't find some genes that were supposed to be expressed. Then, I was wondering if one could "play" a bit with the FDR threshold. By default a gene is considered as expressed if log2>0 and FDR<0.01. However, FDR is different than p-value and I thought that one could allow a bit more flexibility with FDR compared to p-value. They explain it briefly here http://www.cbil.upenn.edu/PaGE/fdr.html.
What do you think about it? How did you decide on the 0.01 threshold?
Thank you for your help.
A.

@owenjm
Copy link
Owner

owenjm commented Feb 16, 2017

You can certainly use a less stringent FDR if you like, although obviously you'll get more false positive calls. However, I agree that what you're seeing doesn't sound "right" -- for most cell types you should see at least several thousand genes come up with the default parameters. Noisy data will impact on the FDR, though, and so will an incorrect normalisation: is the signal very weak or noisy? Do you know that the driver line is OK?

@ademcan
Copy link
Author

ademcan commented Feb 17, 2017

Thank you for your reply Owen.
I do indeed identify a few thousands of genes as expressed but some key genes are missing.
In terms of data, what do you mean by noisy or weak exactly ( in this particular scenario) ?
I didn't perform the lab experiments, but I can check/ask for the driver line.

@ademcan
Copy link
Author

ademcan commented Feb 20, 2017

I performed some detailed analyses about the gene I am interested in (ninaE) in control flies.
The log2 ratio is 0.102149532177245, the number of GATC sites is 16 and the FDR is 0.398215110436791.
Based on the FDR, this gene is not considered as expressed. As you can see in the "log2 scores" boxplot (the red bar corresponds to the ninaE log2 ratio) I guess we can consider it as not weak.

image

In addition I ploted all the max scores under the peak for the identified peaks and again the red bar represent ninaE.

image

I am having a look at the reads distribution on IGV to see if there is too much noise, I don't see anything weird or wrong so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants