Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to APARENT to study pPAS and dPAS #9

Open
yangjywhu opened this issue Feb 15, 2023 · 2 comments
Open

How to APARENT to study pPAS and dPAS #9

yangjywhu opened this issue Feb 15, 2023 · 2 comments

Comments

@yangjywhu
Copy link

Hello!
APARENT is a very good software! I hope to use APARENT to predict the sequences near pPAS and dPAS in my study, but I encountered the following question:

  1. Should I use APARENT or APARENT2?
  2. For all the genes I want to study, I have identified a pPAS and a dPAS. Should I use Notebook 1: APA Isoform & Cleavage Prediction? If needed, how to set site_distance, prox_cut_start set, prox_cut_end(and dist_)? What do Non-normalized proximal sum-cut logit, Non-normalized distal sum-cut logit and Predicted proximal vs. distal isoform % (APADB) mean? As you said in How to get PAS sequence? #1, I have used 100 nt upstream of the poly-A site (proximal and distal) +205nt as the sequence.
  3. Should I use Notebook 2: APA Variant Effect Prediction? If so, how do I get the seq? Do the parameters need to be adjusted?
  4. Can I use APARENT to study others about APA? I have the site of genes' pPAS and dPAS now.

Best,
Yang

@johli
Copy link
Owner

johli commented Feb 16, 2023

Hi Yang,

  1. I would recommend using APARENT2 since it is more accurate than APARENT. Functionally they do the same thing. If you want to try using APARENT2, I would recommend re-purposing the notebook https://github.com/johli/aparent-resnet/blob/master/examples/aparent2_score_variants.ipynb, which currently is used to score variants but can probably be modified to suit your needs quite easily. If you are only interested in scoring the wildtype strength of your pPASes and dPASes, you can just enumerate the pPAS sequences in the 'ref_seqs' list, the dPAS sequences in the 'var_seqs' list and look at the corresponding logit predictions 'np.log(var_iso_pred / (1. - var_iso_pred))' and 'np.log(ref_iso_pred / (1. - ref_iso_pred))'. These will correspond to the PAS affinity ('strength') of the pPAS and dPAS respectively.
  2. Yes this notebook allows you to combine the individual logit predictions of pPAS and dPAS to get a % proximal usage. 'Non-normalized proximal sum-cut logit' and 'Non-normalized distal sum-cut logit' correspond to the individual pPAS strength (logits) and 'Predicted proximal vs. distal isoform % (APADB)' is the resulting proximal probability when combining the individual logits (the combination weights are fitted to APADB). 'site_distance' is the distance in nucleotides between pPAS and dPAS. 'prox_cut_start' to 'prox_cut_end' indicate where the cleavage events should be summed over (the core hexamer is expected to start at pos 70 in the sequence, so prox_cut_start = 80 means that we only count cleavage +10bp from the start of the hexamer). You can probably leave this as default. Unfortunately this script does not exist (yet) for APARENT2.
  3. I don't think so since you are not interested in scoring mutations in any of your polyA signals, only their wildtype strength.
  4. Sorry, I didn't understand this question. Could you repeat it?

All the best,

Johannes

@yangjywhu
Copy link
Author

Hi Johannes @johli

Thank you very much for your prompt reply! Most of my questions have been solved.

For question 4, I would like to know if I can use my PAS site information to predict whether there will be an effect such as mutation or disease. For example, as with Notebook 1: APA Isoform & Cleavage Prediction, if I give a pPAS sequence and a dPAS sequence, can I predict that when the ratio (or PDUI) of the gene is above a threshold, the risk of a mutation or disease is greater, as you analyzed in your part of your paper on developmental and disease.

In short, I have calculated the PDUI of some genes and found that they have some special patterns, and I wondered if they will have any effect in developmental or disease states. I wonder if APARENT (or 2) can predict such a situation.

Best,
Yang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants