-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using Akita for single-cell prediction on the HCT116 cell line, the Pearson correlation coefficient is much lower. #186
Comments
The tutorial parameters are intended to demonstrate the model and training. You'll want to take parameters from the full model in order to reproduce the paper results. |
Thank you for your response. I have used the model parameters mentioned in the paper (link: https://github.com/calico/basenji/blob/master/manuscripts/akita/params.json). What parameters should be adjusted to replicate the experiment? Are these related to data processing? |
You don't need to adjust the parameters to replicate. |
Perhaps I didn't describe it clearly, what I meant was that I used the code you provide to process all the samples in the sequence.bed (7008 for train.413 for validation and 415 for test )and parameters you provided to train, but the result was that the r on the training set was only 0.4. How can I achieve a correlation of 0.6?
|
It's impossible to say with the information you've given. Could you provide more details? |
Of course params as this params.json. |
I brainstormed a bit with Geoff, and one thing we caught was that you need to set -k 1 for the akita_data.py script to perform Gaussian smoothing of the data and make sure the values are getting clipped to [-2,2]. I'm copy-pasting the Methods paragraph with these details. To focus on locus-specific patterns and mitigate the impact of sparse sampling present in even the currently highest-resolution Hi-C maps, we adaptively coarse-grain, normalize for the distance-dependent decrease in contact frequency, take a natural log, clip to (−2,2), linearly interpolate missing bins and convolve with a small 2D Gaussian filter (sigma, 1 and width, 5). The first to third steps use cooltools functions (https://github.com/mirnylab/cooltools). Interpolation of low-coverage bins filtered out in typical Hi-C pipelines was crucial for learning with log(observed/expected) Hi-C targets, greatly outperforming replacing these bins with zeros. |
Thank you very much. I set the 'clip' in the target.txt file, but I forgot to use the '-k 1' parameter during processing. I will add the parameter and retrain to see the results. |
Hi, I have downloaded the sequence.bed file and the cool file for the HCT116 cell line from your Google resource. I have performed data processing and training using your provided code and the parameters given in the tutorial. However, the Pearson correlation coefficient (PearsonR) is significantly lower than 0.6.
The text was updated successfully, but these errors were encountered: