This repo accompanies our survey paper:
Machine Learning Applications for Therapeutic Tasks with Genomics Data. Kexin Huang, Cao Xiao, Lucas M. Glass, Cathy W. Critchlow, Greg Gibson, Jimeng Sun Published in Patterns.
We list tools, algorithms, data for this area. Feel free to make a pull request for new resources.
- Machine Learning for Genomics and Therapeutics Resources
- Machine Learning for Genomics in Target Discovery
- Machine Learning for Genomics in Therapeutics Discovery
- Machine Learning for Genomics in Clinical Study
- Machine Learning for Genomics in Post-Market Study
Task Description Given a set of DNA/RNA sequences predict their binding scores. After training,use feature importance attribution methods to identify the motifs.
Task Description For a DNA/RNA position with missing methylation status, given its availableneighboring methylation states and the DNA/RNA sequence, predict the methylation status on the positionof interest.
Keith D Robertson. Dna methylation and human disease. Nature Reviews Genetics, 6(8):597–610, 2005.
Task Description Given an RNA sequence and its cell type, if available, for each nucleotide,predicts the probability of being a spliced breakpoint and the splicing level.
Task Description Given the histopathology image of the tissue, predict the gene expression forevery gene at each spatial transcriptomics spot.
Task Description Given the gene expressions of a set of cells (in bulk RNA-seq or a spot in spatialtranscriptomics), infer proportion estimates of each cell type for this set.
Task Description Given a set of gene expression profiles of a gene set, identify the gene regulatorynetwork by predicting all pairs of interacting genes.
Task Description Given the aligned sequencing data ((1) read pileup image, which is a matrix ofdimension M and N, with M the number of reads and N the length of reads; or (2) the raw reads, which are aset of sequences strings) for each locus, classify the multi-class variant status.
Task Description Given features about a variant, predict its corresponding disease risk and thenrank all variants based on the disease risk. Alternatively, given the DNA sequence or other related genomicsfeatures, predict the likelihood of disease risk for this sequence and retrieve the variant in the sequence thatcontributes highly to the risk prediction.
Task Description Given the gene expression data and other auxiliary data of a patient predictwhether this patient has a rare disease. Also, identify genetic variants for this rare disease
Task Description Given the known gene-disease association network and auxiliary information,predict the association likelihood for every unknown gene-disease pair.
Task Description Given the gene expression data for a phenotype and known gene relations, identify a set of genes corresponding to disease pathways.
Task Description Given a pair of drug compound molecular structure and gene expression profile of the cell line, predict the drug response in this context
Task Description Given a combination of drug compound structures and a cell line’s genomics profile, predict the combination response.
Task Description With a fixed target, given the gRNA sequence and other auxiliary information such as target gene expression and epigenetic profile, predict its on-target repair outcome.
Task Description Given the gRNA sequence and the off-target DNA sequence, predict its off-target effect.
Task Description Given a set of virus sequences and their labels for a property X, obtain an accurate predictor oracle and conduct various generation modeling to generate de novo virus variants with a high score in X and high diversity.
Task Description : Given genotype-phenotype data of animals and only the genotype data of humans, train the model to fit phenotype from the genotype and transfer this model to human.
Task Description Given the gene expression and other auxiliary information for a set of patients produce criteria for patient stratification.
Task Description Given a pair of patient data (genomics, EHR, etc.) and trial eligibility criteria (text description), predict the matching likelihood.
Task Description Given observation data of the genomic factor, exposure, outcome, and other auxiliary information formulate or identify the causal relations among them and compute the effect of the exposure to the outcome.
Task Description Given a clinical note document, predict the genomic biomarker variable of interest.
Task Description Given a document from literature, extract the drug-gene, drug-disease terms, and predict the interaction types from the text.