Datasets:
final_X_tcga_processed.hkl
: Expression and mutation features for each cell from DepMap 22Q4'sOmicsExpressionProteinCodingGenesTPMLogp1.csv
,OmicsSomaticMutationsMatrixHotspot.csv
, andOmicsSomaticMutationsMatrixDamaging.csv
datasets. It is processed so expression features are z-scored and the features for each cell are l2-normalized to 1.final_X_tcga_raw_unnormalized.hkl
: Expression and mutation features for each cell from DepMap 22Q4'sOmicsExpressionProteinCodingGenesTPMLogp1.csv
,OmicsSomaticMutationsMatrixHotspot.csv
, andOmicsSomaticMutationsMatrixDamaging.csv
datasets.CRISPRGeneEffect_processed.hkl
:CRISPRGeneEffect.csv
from DepMap 22Q4, filtered for cells that we have mutation and expression features for.Chronos_Combined_predictability_results.csv
: Predictability data from DepMapcancerGeneList.tsv
: OncoKB cancer genes (https://www.oncokb.org/cancer-genes)sample_info.csv
: DepMap metadata for cell linesdatasets/tcga_data_processed_figures.hkl
: TCGA data downloaded from Xena
Files:
train_and_get_grads.ipynb
: Train one kernel regression model per knockout and get feature importances for each KO.demo.py
: Use calculated feature importances to visualize feature importance distributions for a given KO.generate_figures.ipynb
: Generate main text figures
Feel free to direct any questions about the code to [email protected].