Skip to content

Latest commit

 

History

History
407 lines (293 loc) · 15.1 KB

README.md

File metadata and controls

407 lines (293 loc) · 15.1 KB

Spatial Decomposition

Estimation of cell type proportions per spot in 2D space from spatial transcriptomic data coupled with corresponding single-cell data

Repository: openproblems-bio/task_spatial_decomposition

Description

Spatial decomposition (also often referred to as Spatial deconvolution) is applicable to spatial transcriptomics data where the transcription profile of each capture location (spot, voxel, bead, etc.) do not share a bijective relationship with the cells in the tissue, i.e., multiple cells may contribute to the same capture location. The task of spatial decomposition then refers to estimating the composition of cell types/states that are present at each capture location. The cell type/states estimates are presented as proportion values, representing the proportion of the cells at each capture location that belong to a given cell type.

We distinguish between reference-based decomposition and de novo decomposition, where the former leverage external data (e.g., scRNA-seq or scNuc-seq) to guide the inference process, while the latter only work with the spatial data. We require that all datasets have an associated reference single cell data set, but methods are free to ignore this information.

Due to the lack of real datasets with the necessary ground-truth, this task makes use of a simulated dataset generated by creating cell-aggregates by sampling from a Dirichlet distribution. The ground-truth dataset consists of the spatial expression matrix, XY coordinates of the spots, true cell-type proportions for each spot, and the reference single-cell data (from which cell aggregated were simulated).

Authors & contributors

name roles
Alma Andersson author, maintainer
Giovanni Palla author, maintainer
Vitalii Kleshchevnikov author
Hirak Sarkar author
Scott Gigante author
Daniel Burkhardt contributor
Can Ergen contributor
Sai Nirmayi Yasa contributor

API

flowchart TB
  file_common_dataset("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-common-dataset'>Common Dataset</a>")
  comp_process_dataset[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-data-processor'>Data processor</a>"/]
  file_single_cell("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-single-cell-data'>Single cell data</a>")
  file_solution("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-solution'>Solution</a>")
  file_spatial_masked("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-spatial-masked'>Spatial masked</a>")
  comp_control_method[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-control-method'>Control method</a>"/]
  comp_method[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-method'>Method</a>"/]
  comp_metric[/"<a href='https://github.com/openproblems-bio/task_spatial_decomposition#component-type-metric'>Metric</a>"/]
  file_output("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-output'>Output</a>")
  file_score("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-score'>Score</a>")
  file_simulated_dataset("<a href='https://github.com/openproblems-bio/task_spatial_decomposition#file-format-common-dataset'>Common Dataset</a>")
  file_common_dataset---comp_process_dataset
  comp_process_dataset-->file_single_cell
  comp_process_dataset-->file_solution
  comp_process_dataset-->file_spatial_masked
  file_single_cell---comp_control_method
  file_single_cell---comp_method
  file_solution---comp_control_method
  file_solution---comp_metric
  file_spatial_masked---comp_control_method
  file_spatial_masked---comp_method
  comp_control_method-->file_output
  comp_method-->file_output
  comp_metric-->file_score
  file_output---comp_metric
Loading

File format: Common Dataset

A subset of the common dataset.

Example file: resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad

Format:

AnnData object
 obs: 'cell_type', 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'

Data structure:

Slot Type Description
obs["cell_type"] string Cell type label IDs.
obs["batch"] string A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
var["hvg"] boolean Whether or not the feature is considered to be a ‘highly variable gene’.
var["hvg_score"] double A ranking of the features by hvg.
obsm["X_pca"] double (Optional) The resulting PCA embedding.
layers["counts"] integer Raw counts.
uns["cell_type_names"] string (Optional) Cell type names corresponding to values in cell_type.
uns["dataset_id"] string A unique identifier for the dataset.
uns["dataset_name"] string Nicely formatted name.
uns["dataset_url"] string (Optional) Link to the original source of the dataset.
uns["dataset_reference"] string (Optional) Bibtex reference of the paper in which the dataset was published.
uns["dataset_summary"] string Short description of the dataset.
uns["dataset_description"] string Long description of the dataset.
uns["dataset_organism"] string (Optional) The organism of the sample in the dataset.

Component type: Data processor

A spatial decomposition dataset processor.

Arguments:

Name Type Description
--input file A subset of the common dataset.
--output_single_cell file (Output) The single-cell data file used as reference for the spatial data.
--output_spatial_masked file (Output) The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.
--output_solution file (Output) The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.

File format: Single cell data

The single-cell data file used as reference for the spatial data

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad

Format:

AnnData object
 obs: 'cell_type', 'batch'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id'

Data structure:

Slot Type Description
obs["cell_type"] string Cell type label IDs.
obs["batch"] string (Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
layers["counts"] integer Raw counts.
uns["cell_type_names"] string Cell type names corresponding to values in cell_type.
uns["dataset_id"] string A unique identifier for the dataset.

File format: Solution

The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad

Format:

AnnData object
 obsm: 'spatial', 'proportions_true'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

Data structure:

Slot Type Description
obsm["spatial"] double XY coordinates for each spot.
obsm["proportions_true"] double True cell type proportions for each spot.
layers["counts"] integer Raw counts.
uns["cell_type_names"] string Cell type names corresponding to columns of proportions.
uns["dataset_id"] string A unique identifier for the dataset.
uns["dataset_name"] string Nicely formatted name.
uns["dataset_url"] string (Optional) Link to the original source of the dataset.
uns["dataset_reference"] string (Optional) Bibtex reference of the paper in which the dataset was published.
uns["dataset_summary"] string Short description of the dataset.
uns["dataset_description"] string Long description of the dataset.
uns["dataset_organism"] string (Optional) The organism of the sample in the dataset.
uns["normalization_id"] string Which normalization was used.

File format: Spatial masked

The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad

Format:

AnnData object
 obsm: 'spatial'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id'

Data structure:

Slot Type Description
obsm["spatial"] double XY coordinates for each spot.
layers["counts"] integer Raw counts.
uns["cell_type_names"] string Cell type names corresponding to columns of proportions_pred in output.
uns["dataset_id"] string A unique identifier for the dataset.

Component type: Control method

Quality control methods for verifying the pipeline.

Arguments:

Name Type Description
--input_single_cell file The single-cell data file used as reference for the spatial data.
--input_spatial_masked file The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.
--input_solution file The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.
--output file (Output) Spatial data with estimated proportions.

Component type: Method

A spatial composition method.

Arguments:

Name Type Description
--input_single_cell file The single-cell data file used as reference for the spatial data.
--input_spatial_masked file The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.
--output file (Output) Spatial data with estimated proportions.

Component type: Metric

A spatial decomposition metric.

Arguments:

Name Type Description
--input_method file Spatial data with estimated proportions.
--input_solution file The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.
--output file (Output) Metric score file.

File format: Output

Spatial data with estimated proportions.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad

Format:

AnnData object
 obsm: 'spatial', 'proportions_pred'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'method_id'

Data structure:

Slot Type Description
obsm["spatial"] double XY coordinates for each spot.
obsm["proportions_pred"] double Estimated cell type proportions for each spot.
layers["counts"] integer Raw counts.
uns["cell_type_names"] string Cell type names corresponding to columns of proportions.
uns["dataset_id"] string A unique identifier for the dataset.
uns["method_id"] string A unique identifier for the method.

File format: Score

Metric score file.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

Data structure:

Slot Type Description
uns["dataset_id"] string A unique identifier for the dataset.
uns["method_id"] string A unique identifier for the method.
uns["metric_ids"] string One or more unique metric identifiers.
uns["metric_values"] double The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.

File format: Common Dataset

A subset of the common dataset.

Example file: resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/simulated_dataset.h5ad

Format:

AnnData object
 obs: 'cell_type', 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca', 'spatial', 'proportions_true'
 layers: 'counts'
 uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'

Data structure:

Slot Type Description
obs["cell_type"] string Cell type label IDs.
obs["batch"] string A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
var["hvg"] boolean Whether or not the feature is considered to be a ‘highly variable gene’.
var["hvg_score"] double A ranking of the features by hvg.
obsm["X_pca"] double The resulting PCA embedding.
obsm["spatial"] double (Optional) XY coordinates for each spot.
obsm["proportions_true"] double (Optional) True cell type proportions for each spot.
layers["counts"] integer Raw counts.
uns["cell_type_names"] string (Optional) Cell type names corresponding to values in cell_type.
uns["dataset_id"] string A unique identifier for the dataset.
uns["dataset_name"] string Nicely formatted name.
uns["dataset_url"] string (Optional) Link to the original source of the dataset.
uns["dataset_reference"] string (Optional) Bibtex reference of the paper in which the dataset was published.
uns["dataset_summary"] string Short description of the dataset.
uns["dataset_description"] string Long description of the dataset.
uns["dataset_organism"] string (Optional) The organism of the sample in the dataset.