Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #54

Closed
rcannood opened this issue May 3, 2023 · 4 comments
Closed

Roadmap #54

rcannood opened this issue May 3, 2023 · 4 comments

Comments

@rcannood
Copy link
Collaborator

rcannood commented May 3, 2023

Proposed interface

library(anndataR)

# read from h5ad/h5mu file
adata <- read_h5ad("dataset.h5ad", backend = "HDF5AnnData")
adata <- read_h5ad("dataset.h5ad", backend = "InMemoryAnnData")

# anndata-like interface (the Python package)
adata$X
adata$obs
adata$var

# optional feature 1: S3 helper functions for a base R-like interface
adata[1:10, 2:30]
dim(adata)
dimnames(adata)
as.matrix(adata, layer = NULL)
as.matrix(adata, layer = "counts")
t(adata)

# optional feature 2: S3 helper functions for a bioconductor-like interface
rowData(adata)
colData(adata)
reducedDimNames(adata)

# converters from/to sce
sce <- adata$to_SingleCellExperiment()
from_sce(sce)

# optional feature 3: converters from/to Seurat
seu <- adata$to_Seurat()
from_seurat(seu)

# optional feature 4: converters from/to SOMA
som <- adata$to_SOMA()
from_soma(som)

Class diagram

classDiagram
  class AbstractAnnData {
    *X: Matrix
    *layers: List[Matrix]
    *obs: DataFrame
    *var: DataFrame
    *obsp: List[Matrix]
    *varp: List[Matrix]
    *obsm: List[Matrix]
    *varm: List[Matrix]
    *uns: List
    *n_obs: int
    *n_vars: int
    *obs_names: Array[String]
    *var_names: Array[String]
    *subset(...): AbstractAnnData
    *write_h5ad(): Unit

    to_SingleCellExperiment(): SingleCellExperiment
    to_Seurat(): Seurat

    to_HDF5AnnData(): HDF5AnnData
    to_ZarrAnnData(): ZarrAnnData
    to_InMemoryAnnData(): InMemoryAnnData
  }

  AbstractAnnData <|-- HDF5AnnData
  class HDF5AnnData {
    init(h5file): HDF5AnnData
  }

  AbstractAnnData <|-- ZarrAnnData
  class ZarrAnnData {
    init(zarrFile): ZarrAnnData
  }

  AbstractAnnData <|-- InMemoryAnnData
  class InMemoryAnnData {
    init(X, obs, var, shape, ...): InMemoryAnnData
  }

  AbstractAnnData <|-- ReticulateAnnData
  class ReticulateAnnData {
    init(pyobj): ReticulateAnnData
  }

  class anndataR {
    read_h5ad(path, backend): Either[AbstractAnnData, SingleCellExperiment, Seurat]
  }
  anndataR --> AbstractAnnData
Loading

Notation:

  • X: Matrix - variable X is of type Matrix
  • *X: Matrix - variable X is abstract
  • to_SingleCellExperiment(): SingleCellExperiment - function to_SingleCellExperiment returns object of type SingleCellExperiment
  • *to_SingleCellExperiment() - function to_SingleCellExperiment is abstract

OO-framework

S4, RC, or R6?

  • S4 offers formal class definitions and multiple dispatch, making it suitable for complex projects, but may be verbose and slower compared to other systems.
  • RC provides reference semantics, familiar syntax, and encapsulation, yet it is less popular and can have performance issues.
  • R6 presents a simple and efficient OOP system with reference semantics and growing popularity, but lacks multiple dispatch and the formality of S4.

Choosing an OOP system depends on the project requirements, developer familiarity, and desired balance between formality, performance, and ease of use.

Approach

  • Implement inheritance objects for AbstractAnnData, HDF5AnnData, InMemoryAnnData
  • Only containing X, layers, obs, var for now
  • Implement base R S3 generics
  • Implement read_h5ad(), $write_h5ad()
  • Implement $to_SingleCellExperiment()
  • Add simple unit tests

Optional:

  • Add more fields (obsp, obsm, varp, varm, ...) --> see class diagram
  • Start implementing MuData
  • Implement $to_Seurat()
  • Implement ZarrAnnData
  • Implement ReticulateAnnData
  • Implement Bioconductor S3 generics

Challenges - Previously encountered issues

Below are previously encountered issues when reading h5ad files using hdf5r. They could be
to create test cases.

No test data yet:

Roadmap

Should we create a public road map / can I add the following items to the project board?

Before release 0.1.0:

Parallel to current release cycle:

After release 0.1.0:

  • Add extra slots (varm, varp, obsm, obsp) to AbstractAnnData (Add remaining slots to AbstractAnnData #50)
    • Implement InMemory extra slots (getters and setters)
    • Implement HDF5 extra slots (getters and setters)
    • Implement SingleCellExperimentConverter extra slots
    • Implement SeuratConverter extra slots
  • Implement ReticulateAnnData (Reticulate backend #48)
@mtmorgan
Copy link
Collaborator

mtmorgan commented May 3, 2023

All sounds good with me. Should we also start to be more disciplined about version number bumps with merges to the main branch? Maybe there's automation for that too...?

@rcannood
Copy link
Collaborator Author

rcannood commented May 3, 2023

Good point. How about we release a version 0.1.0 once support for X, obs, var, obs_names, var_names and layers is implemented and unit tested for the HDF5AnnData, InMemoryAnnData, SeuratConverter and SingleCellExperimented?

From that point on, we add additional functionality by merging PRs into the main branch and adding changelog entries to CHANGELOG.md.

In my opinion we shouldn't merge #50 until the current classes are feature complete. @lazappi @mtmorgan WDYT?

@rcannood rcannood mentioned this issue May 3, 2023
@mtmorgan
Copy link
Collaborator

mtmorgan commented May 5, 2023

I'll mention https://github.com/neurogenomics/scKirby which I recently became aware of @bschilder. A very cursory look indicates that it could definitely leverage anndataR (it currently uses anndata) when it matures. It also contains 'innovations' like reticulate & basilisk conda-based environments for structured control of Python dependencies.

rcannood added a commit that referenced this issue Sep 19, 2023
@rcannood rcannood mentioned this issue Sep 19, 2023
rcannood added a commit that referenced this issue Sep 19, 2023
* update docs

* move class diagram to vignette

* remove doc folder to #54

* don't include design doc in built package
@rcannood
Copy link
Collaborator Author

rcannood commented Nov 5, 2024

The roadmap is now documented in milestones and vignettes, so closing this issue.

@rcannood rcannood closed this as completed Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants