You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to reproduce the paper results and applying this method on new data but I am facing several issues. Many of them were already encountered in previous raised but still open issues.
The trained GBM model used in the paper is not provided in the repository.
Could you please provide the model weights?
Meanwhile, I am trying to redo the whole training process. But it is not clear to me in the code: a) for the GC-corrected short and total fragment coverage
in script 0.5.summarise_data.R, I do not understand why the 'healthy.median' variable is calculated from df.fr3 dataframe ("../inst/extdata/bins_5mbcompartments.rds"). I understood df.fr3 contains the features of the dataset with both cancer and healthy samples. So, in this case, why calculating correlations between this dataset and the healthy samples of this same dataset? This would create data linkage. Am I missing something? b) for the additional features (Z-scores / mitochondria-aligned reads features)
Could you please provide the code used to generate the sample_reference.csv file?
As well as the mean/std of the independent set of 50 healthy samples used to compute the Z scores?
Many thanks in advance for your attention and your time.
Best,
Hanaé
The text was updated successfully, but these errors were encountered:
hanaecarrie
changed the title
Unable to reproduce paper results
Unable to reproduce the GBM model for cancer/healthy classification from lpWGS cfDNA
Apr 19, 2021
Dear authors, dear all,
I am trying to reproduce the paper results and applying this method on new data but I am facing several issues. Many of them were already encountered in previous raised but still open issues.
The trained GBM model used in the paper is not provided in the repository.
Could you please provide the model weights?
Meanwhile, I am trying to redo the whole training process. But it is not clear to me in the code:
a) for the GC-corrected short and total fragment coverage
in script 0.5.summarise_data.R, I do not understand why the 'healthy.median' variable is calculated from df.fr3 dataframe ("../inst/extdata/bins_5mbcompartments.rds"). I understood df.fr3 contains the features of the dataset with both cancer and healthy samples. So, in this case, why calculating correlations between this dataset and the healthy samples of this same dataset? This would create data linkage. Am I missing something?
b) for the additional features (Z-scores / mitochondria-aligned reads features)
Could you please provide the code used to generate the sample_reference.csv file?
As well as the mean/std of the independent set of 50 healthy samples used to compute the Z scores?
Many thanks in advance for your attention and your time.
Best,
Hanaé
The text was updated successfully, but these errors were encountered: