When using Akita for single-cell prediction on the HCT116 cell line, the Pearson correlation coefficient is much lower. #186

1944498970 · 2023-11-25T05:58:20Z

Hi, I have downloaded the sequence.bed file and the cool file for the HCT116 cell line from your Google resource. I have performed data processing and training using your provided code and the parameters given in the tutorial. However, the Pearson correlation coefficient (PearsonR) is significantly lower than 0.6.

davek44 · 2023-12-09T23:07:25Z

The tutorial parameters are intended to demonstrate the model and training. You'll want to take parameters from the full model in order to reproduce the paper results.

1944498970 · 2023-12-10T02:42:12Z

Thank you for your response. I have used the model parameters mentioned in the paper (link: https://github.com/calico/basenji/blob/master/manuscripts/akita/params.json). What parameters should be adjusted to replicate the experiment? Are these related to data processing?

davek44 · 2023-12-15T01:48:05Z

You don't need to adjust the parameters to replicate.

1944498970 · 2023-12-15T03:47:23Z

Perhaps I didn't describe it clearly, what I meant was that I used the code you provide to process all the samples in the sequence.bed (7008 for train.413 for validation and 415 for test )and parameters you provided to train, but the result was that the r on the training set was only 0.4. How can I achieve a correlation of 0.6?

davek44 · 2023-12-15T18:51:35Z

It's impossible to say with the information you've given. Could you provide more details?

1944498970 · 2023-12-16T07:46:55Z

Of course
. I downloaded the file https://storage.googleapis.com/basenji_barnyard2/hg38.ml.fa.gz, and from https://storage.googleapis.com/basenji_hic, I downloaded the files Unsynchronized_all.hg38.2048.cool and sequences.bed. When using akita_data.py for data processing, I read the contents of the sequences.bed file and assigned it to mseqs to ensure that I am using the same data as you. Additionally, during processing, I selected the parameters -l 1048576 --crop 65536 --local --as_obsexp -p 16. Then, I used the parameters from https://github.com/calico/basenji/blob/master/manuscripts/akita/params.json and selected the -k parameter from akita_train.py for training. After training for 140 epochs, the Pearson R has stabilized (I disabled early stopping to train for enough epochs), and the R on the validation set does not exceed 0.45.

params as this params.json.

davek44 · 2023-12-17T23:01:49Z

I brainstormed a bit with Geoff, and one thing we caught was that you need to set -k 1 for the akita_data.py script to perform Gaussian smoothing of the data and make sure the values are getting clipped to [-2,2]. I'm copy-pasting the Methods paragraph with these details.

To focus on locus-specific patterns and mitigate the impact of sparse sampling present in even the currently highest-resolution Hi-C maps, we adaptively coarse-grain, normalize for the distance-dependent decrease in contact frequency, take a natural log, clip to (−2,2), linearly interpolate missing bins and convolve with a small 2D Gaussian filter (sigma, 1 and width, 5). The first to third steps use cooltools functions (https://github.com/mirnylab/cooltools). Interpolation of low-coverage bins filtered out in typical Hi-C pipelines was crucial for learning with log(observed/expected) Hi-C targets, greatly outperforming replacing these bins with zeros.

1944498970 · 2023-12-18T07:27:54Z

Thank you very much. I set the 'clip' in the target.txt file, but I forgot to use the '-k 1' parameter during processing. I will add the parameter and retrain to see the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using Akita for single-cell prediction on the HCT116 cell line, the Pearson correlation coefficient is much lower. #186

When using Akita for single-cell prediction on the HCT116 cell line, the Pearson correlation coefficient is much lower. #186

1944498970 commented Nov 25, 2023

davek44 commented Dec 9, 2023

1944498970 commented Dec 10, 2023

davek44 commented Dec 15, 2023

1944498970 commented Dec 15, 2023 via email •

edited

Loading

davek44 commented Dec 15, 2023

1944498970 commented Dec 16, 2023

davek44 commented Dec 17, 2023

1944498970 commented Dec 18, 2023

When using Akita for single-cell prediction on the HCT116 cell line, the Pearson correlation coefficient is much lower. #186

When using Akita for single-cell prediction on the HCT116 cell line, the Pearson correlation coefficient is much lower. #186

Comments

1944498970 commented Nov 25, 2023

davek44 commented Dec 9, 2023

1944498970 commented Dec 10, 2023

davek44 commented Dec 15, 2023

1944498970 commented Dec 15, 2023 via email • edited Loading

davek44 commented Dec 15, 2023

1944498970 commented Dec 16, 2023

davek44 commented Dec 17, 2023

1944498970 commented Dec 18, 2023

1944498970 commented Dec 15, 2023 via email •

edited

Loading