-
Notifications
You must be signed in to change notification settings - Fork 2
/
_viash.yaml
129 lines (121 loc) · 6.15 KB
/
_viash.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
name: task_perturbation_prediction
label: Perturbation Prediction
summary: Predicting how small molecules change gene expression in different cell types.
description: |
Human biology can be complex, in part due to the function and interplay of the body's
approximately 37 trillion cells, which are organized into tissues, organs, and systems.
However, recent advances in single-cell technologies have provided unparalleled insight
into the function of cells and tissues at the level of DNA, RNA, and proteins. Yet
leveraging single-cell methods to develop medicines requires mapping causal links
between chemical perturbations and the downstream impact on cell state. These experiments
are costly and labor intensive, and not all cells and tissues are amenable to
high-throughput transcriptomic screening. If data science could help accurately predict
chemical perturbations in new cell types, it could accelerate and expand the development
of new medicines.
Several methods have been developed for drug perturbation prediction, most of which are
variations on the autoencoder architecture (Dr.VAE, scGEN, and ChemCPA). However, these
methods lack proper benchmarking datasets with diverse cell types to determine how well
they generalize. The largest available training dataset is the NIH-funded Connectivity
Map (CMap), which comprises over 1.3M small molecule perturbation measurements. However,
the CMap includes observations of only 978 genes, less than 5% of all genes. Furthermore,
the CMap data is comprised almost entirely of measurements in cancer cell lines, which
may not accurately represent human biology.
This task aims to predict how small molecules change gene expression in different cell
types. This task was a [Kaggle competition](https://www.kaggle.com/competitions/open-problems-single-cell-perturbations/overview)
as part of the [NeurIPS 2023 competition track](https://neurips.cc/virtual/2023/competition/66586).
The task is to predict the gene expression profile of a cell after a small molecule
perturbation. For this competition, we designed and generated a novel single-cell
perturbational dataset in human peripheral blood mononuclear cells (PBMCs). We
selected 144 compounds from the Library of Integrated Network-Based Cellular Signatures
(LINCS) Connectivity Map dataset ([PMID: 29195078](https://pubmed.ncbi.nlm.nih.gov/29195078/))
and measured single-cell gene
expression profiles after 24 hours of treatment. The experiment was repeated in three
healthy human donors, and the compounds were selected based on diverse transcriptional
signatures observed in CD34+ hematopoietic stem cells (data not released). We performed
this experiment in human PBMCs because the cells are commercially available with
pre-obtained consent for public release and PBMCs are a primary, disease-relevant tissue
that contains multiple mature cell types (including T-cells, B-cells, myeloid cells,
and NK cells) with established markers for annotation of cell types. To supplement this
dataset, we also measured cells from each donor at baseline with joint scRNA and
single-cell chromatin accessibility measurements using the 10x Multiome assay. We hope
that the addition of rich multi-omic data for each donor and cell type at baseline will
help establish biological priors that explain the susceptibility of particular genes to
exhibit perturbation responses in difference biological contexts.
version: dev
license: MIT
keywords: [single-cell, perturbation prediction, perturbation, benchmark]
links:
issue_tracker: https://github.com/openproblems-bio/task_perturbation_prediction/issues
repository: https://github.com/openproblems-bio/task_perturbation_prediction
docker_registry: ghcr.io
references:
bibtex: |
@article{slazata2024benchmark,
title = {A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types},
author = {Artur Szałata and Andrew Benz and Robrecht Cannoodt and Mauricio Cortes and Jason Fong and Sunil Kuppasani and Richard Lieberman and Tianyu Liu and Javier A. Mas-Rosario and Rico Meinl and Jalil Nourisa and Jared Tumiel and Tin M. Tunjic and Mengbo Wang and Noah Weber and Hongyu Zhao and Benedict Anchang and Fabian J Theis and Malte D Luecken and Daniel B Burkhardt},
booktitle = {The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year = {2024},
url = {https://openreview.net/forum?id=WTI4RJYSVm}
}
authors:
- name: Artur Szałata
roles: [ author ]
info:
github: szalata
orcid: "000-0001-8413-234X"
- name: Robrecht Cannoodt
roles: [ author ]
info:
github: rcannood
orcid: "0000-0003-3641-729X"
- name: Daniel Burkhardt
roles: [ author ]
info:
github: dburkhardt
orcid: 0000-0001-7744-1363
- name: Malte D. Luecken
roles: [ author ]
info:
github: LuckyMD
orcid: 0000-0001-7464-7921
- name: Tin M. Tunjic
roles: [ contributor ]
info:
github: ttunja
orcid: 0000-0001-8842-6548
- name: Mengbo Wang
roles: [ contributor ]
info:
github: wangmengbo
orcid: 0000-0002-0266-9993
- name: Andrew Benz
roles: [ author ]
info:
github: andrew-benz
orcid: 0009-0002-8118-1861
- name: Tianyu Liu
roles: [ contributor ]
info:
github: HelloWorldLTY
orcid: 0000-0002-9412-6573
- name: Jalil Nourisa
roles: [ contributor ]
info:
github: janursa
orcid: 0000-0002-7539-4396
- name: Rico Meinl
roles: [ contributor ]
info:
github: ricomnl
orcid: 0000-0003-4356-6058
# technical settings
organization: openproblems-bio
viash_version: 0.9.0
info:
test_resources:
- type: s3
path: s3://openproblems-data/resources/task_perturbation_prediction/datasets
dest: resources/datasets
# set default labels
config_mods: |
.runners[.type == "nextflow"].config.labels := { lowmem : "memory = 20.Gb", midmem : "memory = 50.Gb", highmem : "memory = 100.Gb", lowcpu : "cpus = 5", midcpu : "cpus = 15", highcpu : "cpus = 30", lowtime : "time = 1.h", midtime : "time = 4.h", hightime : "time = 8.h", veryhightime : "time = 24.h" }