Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chrombpnet model uncorrected #214

Closed
linzyzhao2002 opened this issue Nov 30, 2024 · 20 comments
Closed

Chrombpnet model uncorrected #214

linzyzhao2002 opened this issue Nov 30, 2024 · 20 comments

Comments

@linzyzhao2002
Copy link

Hello,

Hope this message finds you well. I am encountering a problem during training the chrombpnet model, where the report says "the model is uncorrected, average of the max of the profiles is 0.024". However, based on the Tn5 profile below, the maximum is below 0.001.

Additionally, I am training using a pretrained bias model, using the same peak and non-peak regions as the chrombpnet model. Please let me know why the error occurs, and any help would be deeply appreciated!
Screenshot 2024-11-29 at 19 43 18

@panushri25
Copy link
Collaborator

Can you share your full report?

@panushri25
Copy link
Collaborator

Also if you can share the full output directory that will be helpful

@linzyzhao2002
Copy link
Author

Can you share your full report?

Thank you so much for your help!! This is the output for my chrombpnet model:
train model output 19.pdf

This is the model output for my bias model:
bias model output.pdf
I trained the bias model with hyper parameter of 0.4.

@linzyzhao2002
Copy link
Author

Also if you can share the full output directory that will be helpful
slurm_54367172.pdf
I'm attaching the full output directory.

Any help would be greatly appreciated - thank you so much for your time and help!!!

@linzyzhao2002
Copy link
Author

Also just to add, I noticed that the pearsonr in peaks is always >0 (around 0.30) for the bias model. Could this be the problem here? Does this suggest anything wrong with the peak file or the bam file?

I generated the peak file using macs2 callpeak with qvalue cutoff of 0.01 and then removed the blacklist region. The bam files i used was deduplicated/removed MT reads. Please let me know and thank you so much for your help!!

@panushri25
Copy link
Collaborator

@linzyzhao2002 noticed you posted on the other thread too. Are you merging peaks across celltypes as well?

@linzyzhao2002
Copy link
Author

@linzyzhao2002 noticed you posted on the other thread too. Are you merging peaks across celltypes as well?

No I'm not. I am training chrombpnet on only one cell type (bulk ATAC-seq for mouse T cells), sampled at different time points (0h, 2h, 5h etc). My bias model was trained with peaks called at the 5h time point. Do you have any suggestions? Thank you so much!

@linzyzhao2002
Copy link
Author

Adding to the context, I am doing analysis on time-course ATAC-seq for in vitro activated mouse T cells. I am using a single time-point (say 5h) for training the bias model. I then use this bias model to train my chrombpnet model on the 5h timepoint bam file. However, I observed that the nobias model was uncorrected (average profile maximum for Tn5 is 0.024), and that Tn5 motifs were observed by TFmodisco. I tried sweeping the hyper-parameter (from 0.2 to 0.5) but still got the same results. Could you please let me know what could be happening here? Thank you so much!!

Sincerely,
Linzy

@panushri25
Copy link
Collaborator

Is the 5h timepoint the deepest dataset you have? I would recommend using the deepest time point to build the bias model and then use it across time points.

@panushri25
Copy link
Collaborator

Is this file here train model output 19.pdf using the 5h timepoint bam file or the bias model provided with the repo. Can you provide both reports using each of the bias models? I would generally recommend using the deepest time point to build the bias model and then use it across time points.

@linzyzhao2002
Copy link
Author

Is this file here train model output 19.pdf using the 5h timepoint bam file or the bias model provided with the repo. Can you provide both reports using each of the bias models? I would generally recommend using the deepest time point to build the bias model and then use it across time points.

Thank you so much for your reply and happy new year!

I'm sorry if I didn't say this clear.

  1. I trained a bias model using the 5h timepoint bam file
    bias model 3 output.pdf

  2. I then use this bias model to train a non-bias model, using the same 5h bam file. However, i see that the model is uncorrected.
    train model output 19.pdf

Could you please suggest possible reasons why the model is uncorrected? Is there anything wrong based on the reports? Thank you so much!!!

@panushri25
Copy link
Collaborator

Its strange that your marginal footprints for the bias motif are different from 0.024, my guess is that the output directory is corrupted and has outputs from multiple runs.

@linzyzhao2002
Copy link
Author

linzyzhao2002 commented Jan 10, 2025

Its strange that your marginal footprints for the bias motif are different from 0.024, my guess is that the output directory is corrupted and has outputs from multiple runs.

Interesting!! Thanks for your suggestions. Does this mean that the model might in fact be "corrected", and I could use this for TFmodisco prediction? Thank you!

@panushri25
Copy link
Collaborator

I cant say from this because I dont know whats corrupted and whats not. Can you send a screen shot of the directories with its time stamps? Did you make two runs in the same directory?

It might be better to do a fresh run in a new directory!

@linzyzhao2002
Copy link
Author

I cant say from this because I dont know whats corrupted and whats not. Can you send a screen shot of the directories with its time stamps? Did you make two runs in the same directory?

It might be better to do a fresh run in a new directory!

Thanks for the suggestion! I'm sorry I don't have access to the folders now since the cluster is down for maintainence, but for each model i used a new output directory (a new folder)!

@linzyzhao2002
Copy link
Author

I cant say from this because I dont know whats corrupted and whats not. Can you send a screen shot of the directories with its time stamps? Did you make two runs in the same directory?

It might be better to do a fresh run in a new directory!

Maybe I will try TFmodisco using this nobias model and then see if the predictions make sense

@panushri25
Copy link
Collaborator

what is the read depth of your individual time points?

@linzyzhao2002
Copy link
Author

what is the read depth of your individual time points?

The read depths for each time point is quite similar, around 50 million reads each! Is this sufficient?

@panushri25
Copy link
Collaborator

yeah thats fine! I think this more of a corruption issue ... from the time stamps you can make sure that all the files in this directory evaluation/ are dated after the creation of the chombpnet models.

If they are dated correctly you can post back the output html with a bias model provided with the repo and we can take it from there.

I will close this issue for now, do reply back or feel free to open a new issue if you have more questions.

@linzyzhao2002
Copy link
Author

yeah thats fine! I think this more of a corruption issue ... from the time stamps you can make sure that all the files in this directory evaluation/ are dated after the creation of the chombpnet models.

If they are dated correctly you can post back the output html with a bias model provided with the repo and we can take it from there.

I will close this issue for now, do reply back or feel free to open a new issue if you have more questions.

Sounds good! Thank you so much for all your help!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants