Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): read_elem_as_dask method #1469

Merged
merged 160 commits into from
Jul 23, 2024
Merged
Changes from 1 commit
Commits
Show all changes
160 commits
Select commit Hold shift + click to select a range
d111f04
(feat): `read_elem_lazy` method
ilan-gold Apr 11, 2024
00be7f0
(revert): error message
ilan-gold Apr 11, 2024
fd635d7
(refactor): declare `is_csc` reading elem directly in h5
ilan-gold Apr 11, 2024
f5e7fda
(chore): `read_elem_lazy` -> `read_elem_as_dask`
ilan-gold Apr 12, 2024
ae5396c
(chore): remove string handling
ilan-gold Apr 12, 2024
664336a
(refactor): use `elem` for h5 where posssble
ilan-gold Apr 12, 2024
2370215
Merge branch 'main' into ig/read_dask_elem
ilan-gold Apr 17, 2024
52002b6
(chore): remove invlaud syntax
ilan-gold Apr 17, 2024
5ab1ad1
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Apr 17, 2024
aa1006e
(fix): put dask import inside function
ilan-gold Apr 17, 2024
dda7d83
(refactor): try maybe open?
ilan-gold Apr 17, 2024
fd418f0
Merge branch 'main' into ig/read_dask_elem
ilan-gold May 27, 2024
23b0bfd
Merge branch 'main' into ig/read_dask_elem
ilan-gold May 27, 2024
97b8031
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jun 3, 2024
1fc4cc3
(fix): revert `encoding-version`
ilan-gold Jun 3, 2024
5ca71ea
(chore): document `create_sparse_store` test function
ilan-gold Jun 3, 2024
3672c18
(chore): sort indices to prevent warning
ilan-gold Jun 3, 2024
33c3599
(fix): remove utility function `make_dask_array`
ilan-gold Jun 3, 2024
157e710
(chore): `read_sparse_as_dask_h5` -> `read_sparse_as_dask`
ilan-gold Jun 3, 2024
375000d
(feat): make params of `h5_chunks` and `stride`
ilan-gold Jun 3, 2024
241904a
(chore): add distributed test
ilan-gold Jun 3, 2024
42d0d22
(fix): `TypeVar` bind
ilan-gold Jun 3, 2024
0bba2c0
(chore): release note
ilan-gold Jun 4, 2024
0d0b43a
(chore): `0.10.8` -> `0.11.0`
ilan-gold Jun 5, 2024
762d4c6
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jun 26, 2024
c935fe0
(fix): `ruff` for default `pytest.fixture` `scope`
ilan-gold Jun 26, 2024
23e0ea2
Apply suggestions from code review
ilan-gold Jul 1, 2024
5b96c77
(fix): `Any` to `DaskArray`
ilan-gold Jul 1, 2024
0907a4e
(fix): type `make_index` + fix undeclared
ilan-gold Jul 1, 2024
20ced16
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 1, 2024
36ae8f2
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 1, 2024
bb6607e
fix rest
flying-sheep Jul 1, 2024
419691b
(fix): use `chunks` kwarg
ilan-gold Jul 2, 2024
a23df34
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 2, 2024
fd2376a
(feat): expose `chunks` as an option to `read_elem_as_dask` via `data…
ilan-gold Jul 2, 2024
ae723d0
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 2, 2024
42b1093
(fix): `test_read_dispatched_null_case` test
ilan-gold Jul 2, 2024
78de057
(fix): disallowed spread syntax?
ilan-gold Jul 2, 2024
717b997
(refactor): reuse `compute_chunk_layout_for_axis_shape` functionality
ilan-gold Jul 2, 2024
2b86293
(fix): remove unneeded `slice` arguments
ilan-gold Jul 3, 2024
8d5a9df
(fix): revert message
ilan-gold Jul 3, 2024
449fc1a
(refactor): `make_index` -> `make_block_indexer`
ilan-gold Jul 3, 2024
1522de3
(fix): export from `experimental`
ilan-gold Jul 3, 2024
71c150d
(fix): `callback` signature for `test_read_dispatched_null_case
ilan-gold Jul 3, 2024
b441366
(chore): `get_elem_name` helper
ilan-gold Jul 3, 2024
0307a1d
(chore): use `H5Group` consistently
ilan-gold Jul 3, 2024
ee075cd
(refactor): make `chunks` public facing API instead of `dataset_kwargs`
ilan-gold Jul 3, 2024
89acec4
(fix): regsiter for group not array
ilan-gold Jul 3, 2024
48b7630
(chore): add warning test
ilan-gold Jul 3, 2024
8712582
(chore): make arg order consistent
ilan-gold Jul 3, 2024
cda8aa7
(feat): add `callback` typing for `read_dispatched`
ilan-gold Jul 5, 2024
e8f62f4
(chore): use `npt.NDArray`
ilan-gold Jul 5, 2024
f6e48ac
(fix): remove uneceesary union
ilan-gold Jul 5, 2024
4de3246
(chore): release note
ilan-gold Jul 5, 2024
ba817e0
(fix); try protocol docs
ilan-gold Jul 5, 2024
438d28d
(feat): create `InMemoryElem` + `DictElemType` to remove `Any`
ilan-gold Jul 5, 2024
296ea3f
(chore): refactor `DictElemType` -> `InMemoryArrayOrScalarType` for r…
ilan-gold Jul 5, 2024
cf13a57
(fix): use `Union`
ilan-gold Jul 5, 2024
d02ba49
(fix): more `Union`
ilan-gold Jul 5, 2024
6970a97
(refactor): `InMemoryElem` -> `InMemoryReadElem`
ilan-gold Jul 5, 2024
2282351
(chore): add needed types to public export + docs fix
ilan-gold Jul 5, 2024
810cd0a
Merge branch 'main' into ig/read_dask_elem
flying-sheep Jul 8, 2024
a996081
(chore): type `write_elem` functions
ilan-gold Jul 8, 2024
f6e457b
(chore): create `write_callback` protocol
ilan-gold Jul 8, 2024
a0b4057
Merge branch 'main' into ig/protocol_for_callback
ilan-gold Jul 8, 2024
4416526
(chore): export + docs
ilan-gold Jul 8, 2024
fbe44f0
(fix): add string descriptions
ilan-gold Jul 8, 2024
8c1f01d
(fix): try sphinx protocol doc
ilan-gold Jul 8, 2024
a7d412a
(fix): try ignoring exports
ilan-gold Jul 8, 2024
4d56396
(fix): remap callback internal usages
ilan-gold Jul 8, 2024
2012ee5
(fix): add docstring
ilan-gold Jul 8, 2024
f65f065
Discard changes to pyproject.toml
flying-sheep Jul 9, 2024
8f6ea49
re-add dep
flying-sheep Jul 9, 2024
155a21e
Fix docs
flying-sheep Jul 9, 2024
daae3e5
Almost works
flying-sheep Jul 9, 2024
c415ae4
works!
flying-sheep Jul 9, 2024
00010b8
(chore): use pascal-case
ilan-gold Jul 9, 2024
0bd87fc
(feat): type read/write funcs in callback
ilan-gold Jul 9, 2024
5997678
(fix): use generic for `Read` as well.
ilan-gold Jul 9, 2024
f208332
(fix): need more aliases
ilan-gold Jul 9, 2024
eb69fcb
Split table, format
flying-sheep Jul 9, 2024
477bbef
(refactor): move to `_types` file
ilan-gold Jul 9, 2024
103cad6
Merge branch 'ig/protocol_for_callback' of github.com:scverse/anndata…
ilan-gold Jul 9, 2024
8d23f6f
bump scanpydoc
flying-sheep Jul 9, 2024
9b647c2
Some basic syntax fixes
flying-sheep Jul 9, 2024
d6d01bc
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 9, 2024
5ef93e1
(fix): change `Read{Callback}` type for kwargs
ilan-gold Jul 9, 2024
9cfe908
(chore): test `chunks `argument
ilan-gold Jul 9, 2024
99fc6db
(fix): type `read_recarray`
ilan-gold Jul 9, 2024
b5bccc3
(fix): `GroupyStorageType` not `StorageType`
ilan-gold Jul 9, 2024
e5ea2b0
(fix): little type fixes
ilan-gold Jul 9, 2024
6ac72d6
(fix): clarify `H5File` typing
ilan-gold Jul 9, 2024
989dc65
(fix): dask doc
ilan-gold Jul 9, 2024
36b0207
(fix): dask docs
ilan-gold Jul 9, 2024
dadfb4d
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 9, 2024
ca6cf66
(fix): typing
ilan-gold Jul 9, 2024
eabaf35
(fix): handle case when `chunks` is `None`
ilan-gold Jul 9, 2024
4c398c3
(feat): add string-array reading
ilan-gold Jul 9, 2024
d6fc8a4
(fix): remove `string-array` because it is not tested
ilan-gold Jul 9, 2024
33aebb2
(refactor): clean up tests
ilan-gold Jul 10, 2024
701cd85
(fix): overfetching problem
ilan-gold Jul 10, 2024
43b21a2
Fix circular import
flying-sheep Jul 11, 2024
0e22449
add some typing
flying-sheep Jul 11, 2024
ec546f4
fix mapping types
flying-sheep Jul 11, 2024
7c2e4da
Fix Read/Write
flying-sheep Jul 11, 2024
1ba5b99
Fix one more
flying-sheep Jul 11, 2024
49c0d49
unify names
flying-sheep Jul 11, 2024
3666735
claift ReadCallback signature
flying-sheep Jul 11, 2024
3a332ad
Fix type aliases
flying-sheep Jul 11, 2024
d0f4d13
(fix): clean up typing to use `RWAble`
ilan-gold Jul 11, 2024
6e89e14
Merge branch 'main' into ig/protocol_for_callback
ilan-gold Jul 11, 2024
ea29cfa
(fix): use `Union`
ilan-gold Jul 11, 2024
f4ff236
(fix): add qualname override
ilan-gold Jul 11, 2024
f50b286
(fix): ignore dask and masked array
ilan-gold Jul 11, 2024
712e085
(fix): ignore erroneous class warning
ilan-gold Jul 11, 2024
24dd18b
(fix): upgrade `scanpydoc`
ilan-gold Jul 11, 2024
79d3fdc
(fix): use `MutableMapping` instead of `dict` due to broken docstring
ilan-gold Jul 11, 2024
9a2be00
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 11, 2024
d3bcddf
Add data docs
flying-sheep Jul 11, 2024
84fdc96
Revert "(fix): use `MutableMapping` instead of `dict` due to broken d…
flying-sheep Jul 11, 2024
2608bc3
(fix): add clarification
ilan-gold Jul 11, 2024
e551e18
Simplify
flying-sheep Jul 11, 2024
13e3bb1
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 11, 2024
2935e45
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 11, 2024
bf0be15
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 11, 2024
9d37fc8
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 12, 2024
1ffe43e
(fix): remove double `dask` intersphinx
ilan-gold Jul 12, 2024
f9df5bc
(fix): remove `_types.DaskArray` from type checking block
ilan-gold Jul 12, 2024
a85da39
(refactor): use `block_info` for resolving fetch location
ilan-gold Jul 15, 2024
3bef77c
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 15, 2024
899184f
(fix): dtype for reading
ilan-gold Jul 15, 2024
efb70ec
(fix): ignore import cycle problem (why??)
ilan-gold Jul 16, 2024
118f43c
(fix): add issue
ilan-gold Jul 16, 2024
f742a0a
(fix): subclass `Reader` to remove `datasetkwargs`
ilan-gold Jul 18, 2024
ae68731
(fix): add message tp errpr
ilan-gold Jul 18, 2024
f5e7760
Update tests/test_io_elementwise.py
ilan-gold Jul 18, 2024
96b13a3
(fix): correct `self.callback` check
ilan-gold Jul 18, 2024
9c68e36
(fix): erroneous diffs
ilan-gold Jul 18, 2024
410aeda
(fix): extra `read_elem` `dataset_kwargs`
ilan-gold Jul 18, 2024
31a30c4
(fix): remove more `dataset_kwargs` nonsense
ilan-gold Jul 18, 2024
80fe8cb
(chore): add docs
ilan-gold Jul 18, 2024
b314248
(fix): use `block_info` for dense
ilan-gold Jul 18, 2024
02d4735
(fix): more erroneous diffs
ilan-gold Jul 18, 2024
6e5534a
(fix): use context again
ilan-gold Jul 18, 2024
d26cfe8
(fix): change size by dimension in tests
ilan-gold Jul 22, 2024
94e43a3
(refactor): clean up `get_elem_name`
ilan-gold Jul 22, 2024
5160016
(fix): try new sphinx for error
ilan-gold Jul 22, 2024
43da9a3
(fix): return type
ilan-gold Jul 22, 2024
9735ced
(fix): protocol for reading
ilan-gold Jul 22, 2024
f1730c3
(fix): bring back ignored warning
ilan-gold Jul 22, 2024
9861b56
Fix docs
flying-sheep Jul 22, 2024
235096a
almost fix typing
flying-sheep Jul 22, 2024
dce9f07
add wrapper
flying-sheep Jul 22, 2024
2725ef2
move into type checking
flying-sheep Jul 22, 2024
ffe89f0
(fix): small type fxes
ilan-gold Jul 22, 2024
6cb231e
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 22, 2024
75a64fc
block info types
flying-sheep Jul 22, 2024
3f734fe
simplify
flying-sheep Jul 22, 2024
c4c2356
rename
flying-sheep Jul 22, 2024
cc67a9b
simplify more
flying-sheep Jul 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
(fix): subclass Reader to remove datasetkwargs
ilan-gold committed Jul 18, 2024
commit f742a0a8cbce3cc75f517c0a809964ea02dd834c
2 changes: 1 addition & 1 deletion src/anndata/_io/h5ad.py
Original file line number Diff line number Diff line change
@@ -233,7 +233,7 @@ def read_h5ad(

with h5py.File(filename, "r") as f:

def callback(func, elem_name: str, elem, dataset_kwargs, iospec):
def callback(func, elem_name: str, elem, iospec):
if iospec.encoding_type == "anndata" or elem_name.endswith("/"):
return AnnData(
**{
23 changes: 7 additions & 16 deletions src/anndata/_io/specs/lazy_methods.py
Original file line number Diff line number Diff line change
@@ -3,7 +3,6 @@
from contextlib import contextmanager
from functools import singledispatch
from pathlib import Path, PurePosixPath
from types import MappingProxyType
from typing import TYPE_CHECKING

import h5py
@@ -16,8 +15,7 @@
from .registry import _LAZY_REGISTRY, IOSpec

if TYPE_CHECKING:
from collections.abc import Mapping
from typing import Any, Literal, Union
from typing import Literal, Union

from .registry import Reader

@@ -67,9 +65,7 @@ def _(x):
@_LAZY_REGISTRY.register_read(ZarrGroup, IOSpec("csc_matrix", "0.1.0"))
@_LAZY_REGISTRY.register_read(ZarrGroup, IOSpec("csr_matrix", "0.1.0"))
def read_sparse_as_dask(
elem: H5Group | ZarrGroup,
_reader: Reader,
dataset_kwargs: Mapping[str, Any] = MappingProxyType({}),
elem: H5Group | ZarrGroup, _reader: Reader, chunks: tuple[int, ...] | None = None
):
import dask.array as da

@@ -79,7 +75,6 @@ def read_sparse_as_dask(
dtype = elem["data"].dtype
is_csc: bool = elem.attrs["encoding-type"] == "csc_matrix"

chunks = dataset_kwargs.get("chunks", None)
stride: int = _DEFAULT_STRIDE
if chunks is not None:
if len(chunks) != 2:
@@ -129,18 +124,16 @@ def make_dask_chunk(

@_LAZY_REGISTRY.register_read(H5Array, IOSpec("array", "0.2.0"))
def read_h5_array(
elem: H5Array,
_reader: Reader,
dataset_kwargs: Mapping[str, Any] = MappingProxyType({}),
elem: H5Array, _reader: Reader, chunks: tuple[int, ...] | None = None
):
import dask.array as da

path = Path(elem.file.filename)
elem_name = elem.name
shape = tuple(elem.shape)
dtype = elem.dtype
chunks: tuple[int, ...] = dataset_kwargs.get(
"chunks", (_DEFAULT_STRIDE,) * len(shape)
chunks: tuple[int, ...] = (
chunks if chunks is not None else (_DEFAULT_STRIDE,) * len(shape)
)

def make_dask_chunk(block_id: tuple[int, int]):
@@ -166,11 +159,9 @@ def make_dask_chunk(block_id: tuple[int, int]):

@_LAZY_REGISTRY.register_read(ZarrArray, IOSpec("array", "0.2.0"))
def read_zarr_array(
elem: ZarrArray,
_reader: Reader,
dataset_kwargs: Mapping[str, Any] = MappingProxyType({}),
elem: ZarrArray, _reader: Reader, chunks: tuple[int, ...] | None = None
):
chunks: tuple[int, ...] = dataset_kwargs.get("chunks", elem.chunks)
chunks: tuple[int, ...] = chunks if chunks is not None else elem.chunks
import dask.array as da

return da.from_zarr(elem, chunks=chunks)
17 changes: 0 additions & 17 deletions src/anndata/_io/specs/methods.py
Original file line number Diff line number Diff line change
@@ -126,7 +126,6 @@ def read_basic(
elem: H5File | H5Group | H5Array,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> dict[str, InMemoryArrayOrScalarType] | npt.NDArray | sparse.spmatrix | SpArray:
from anndata._io import h5ad

@@ -151,7 +150,6 @@ def read_basic_zarr(
elem: ZarrGroup | ZarrArray,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> dict[str, InMemoryArrayOrScalarType] | npt.NDArray | sparse.spmatrix | SpArray:
from anndata._io import zarr

@@ -299,7 +297,6 @@ def read_anndata(
elem: GroupStorageType | H5File,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> AnnData:
d = {}
for k in [
@@ -346,7 +343,6 @@ def read_mapping(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> dict[str, RWAble]:
return {k: _reader.read_elem(v) for k, v in elem.items()}

@@ -460,7 +456,6 @@ def read_array(
elem: ArrayStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> npt.NDArray:
return elem[()]

@@ -482,7 +477,6 @@ def read_string_array(
d: H5Array,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
):
return read_array(d.asstr(), _reader=_reader)

@@ -568,7 +562,6 @@ def read_recarray(
d: ArrayStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> np.recarray | npt.NDArray:
value = d[()]
dtype = value.dtype
@@ -785,7 +778,6 @@ def read_sparse(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> sparse.spmatrix | SpArray:
return sparse_dataset(elem).to_memory()

@@ -835,7 +827,6 @@ def read_awkward(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> AwkArray:
from anndata.compat import awkward as ak

@@ -909,7 +900,6 @@ def read_dataframe(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> pd.DataFrame:
columns = list(_read_attr(elem.attrs, "column-order"))
idx_key = _read_attr(elem.attrs, "_index")
@@ -955,7 +945,6 @@ def read_dataframe_0_1_0(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> pd.DataFrame:
columns = _read_attr(elem.attrs, "column-order")
idx_key = _read_attr(elem.attrs, "_index")
@@ -1031,7 +1020,6 @@ def read_categorical(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> pd.Categorical:
return pd.Categorical.from_codes(
codes=_reader.read_elem(elem["codes"]),
@@ -1087,7 +1075,6 @@ def read_nullable_integer(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> pd.api.extensions.ExtensionArray:
if "mask" in elem:
return pd.arrays.IntegerArray(
@@ -1103,7 +1090,6 @@ def read_nullable_boolean(
elem: GroupStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> pd.api.extensions.ExtensionArray:
if "mask" in elem:
return pd.arrays.BooleanArray(
@@ -1124,7 +1110,6 @@ def read_scalar(
elem: ArrayStorageType,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> np.number:
return elem[()]

@@ -1176,7 +1161,6 @@ def read_hdf5_string(
elem: H5Array,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> str:
return elem.asstr()[()]

@@ -1186,7 +1170,6 @@ def read_zarr_string(
elem: ZarrArray,
*,
_reader: Reader,
dataset_kwargs: MappingProxyType = MappingProxyType({}),
) -> str:
return str(elem[()])

31 changes: 20 additions & 11 deletions src/anndata/_io/specs/registry.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from __future__ import annotations

import inspect
import warnings
from collections.abc import Mapping
from dataclasses import dataclass
@@ -275,16 +274,28 @@ def read_elem(
iospec = get_spec(elem)
read_func = self.registry.get_read(type(elem), iospec, modifiers, reader=self)
if self.callback is None:
return read_func(elem, dataset_kwargs=dataset_kwargs)
if "dataset_kwargs" not in inspect.getfullargspec(self.callback)[0]:
return read_func(elem)
return self.callback(read_func, elem.name, elem, iospec=iospec)


class DaskReader(Reader):
@report_read_key_on_error
def read_elem(
self,
elem: StorageType,
modifiers: frozenset[str] = frozenset(),
chunks: tuple[int, ...] | None = None,
) -> InMemoryElem:
"""Read an element from a store. See exported function for more details."""

iospec = get_spec(elem)
read_func = self.registry.get_read(type(elem), iospec, modifiers, reader=self)
if self.callback is None:
warnings.warn(
"Callback does not accept dataset_kwargs. Ignoring dataset_kwargs.",
"Dask reading does not use a callback. Ignoring callback.",
stacklevel=2,
)
return self.callback(read_func, elem.name, elem, iospec=iospec)
return self.callback(
read_func, elem.name, elem, dataset_kwargs=dataset_kwargs, iospec=iospec
)
return read_func(elem, chunks=chunks)


class Writer:
@@ -385,9 +396,7 @@ def read_elem_as_dask(
-------
DaskArray
"""
return Reader(_LAZY_REGISTRY).read_elem(
elem, dataset_kwargs={"chunks": chunks} if chunks is not None else {}
)
return DaskReader(_LAZY_REGISTRY).read_elem(elem, chunks=chunks)


def write_elem(
2 changes: 1 addition & 1 deletion src/anndata/_io/zarr.py
Original file line number Diff line number Diff line change
@@ -66,7 +66,7 @@ def read_zarr(store: str | Path | MutableMapping | zarr.Group) -> AnnData:
f = zarr.open(store, mode="r")

# Read with handling for backwards compat
def callback(func, elem_name: str, elem, dataset_kwargs, iospec):
def callback(func, elem_name: str, elem, iospec):
if iospec.encoding_type == "anndata" or elem_name.endswith("/"):
return AnnData(
**{
2 changes: 1 addition & 1 deletion src/anndata/experimental/merge.py
Original file line number Diff line number Diff line change
@@ -134,7 +134,7 @@ def read_as_backed(group: ZarrGroup | H5Group):
BaseCompressedSparseDataset, Array or EAGER_TYPES are encountered.
"""

def callback(func, elem_name: str, elem, dataset_kwargs, iospec):
def callback(func, elem_name: str, elem, iospec):
if iospec.encoding_type in SPARSE_MATRIX:
return sparse_dataset(elem)
elif iospec.encoding_type in EAGER_TYPES:
2 changes: 1 addition & 1 deletion tests/test_backed_sparse.py
Original file line number Diff line number Diff line change
@@ -70,7 +70,7 @@ def read_zarr_backed(path):
f = zarr.open(path, mode="r")

# Read with handling for backwards compat
def callback(func, elem_name, elem, iospec, dataset_kwargs):
def callback(func, elem_name, elem, iospec):
if iospec.encoding_type == "anndata" or elem_name.endswith("/"):
return AnnData(
**{k: read_dispatched(v, callback) for k, v in elem.items()}
26 changes: 5 additions & 21 deletions tests/test_io_dispatched.py
Original file line number Diff line number Diff line change
@@ -3,7 +3,6 @@
import re

import h5py
import pytest
import zarr
from scipy import sparse

@@ -19,7 +18,7 @@


def test_read_dispatched_w_regex():
def read_only_axis_dfs(func, elem_name: str, elem, iospec, dataset_kwargs):
def read_only_axis_dfs(func, elem_name: str, elem, iospec):
if iospec.encoding_type == "anndata":
return func(elem)
elif re.match(r"^/((obs)|(var))?(/.*)?$", elem_name):
@@ -41,7 +40,7 @@ def read_only_axis_dfs(func, elem_name: str, elem, iospec, dataset_kwargs):
def test_read_dispatched_dask():
import dask.array as da

def read_as_dask_array(func, elem_name: str, elem, iospec, dataset_kwargs):
def read_as_dask_array(func, elem_name: str, elem, iospec):
if iospec.encoding_type in {
"dataframe",
"csr_matrix",
@@ -78,29 +77,14 @@ def test_read_dispatched_null_case():

expected = read_elem(z)

def callback(read_func, elem_name, x, dataset_kwargs, iospec):
def callback(read_func, elem_name, x, iospec):
return read_elem(x)

actual = read_dispatched(z, callback)

assert_equal(expected, actual)


def test_read_dispatched_warns_with_no_dataset_kwargs():
adata = gen_adata((100, 100))
z = zarr.group()
write_elem(z, "/", adata)

def callback(read_func, elem_name, x, iospec):
return read_elem(x)

with pytest.warns(
UserWarning,
match="Callback does not accept dataset_kwargs. Ignoring dataset_kwargs.",
):
read_dispatched(z, callback)


def test_write_dispatched_chunks():
from itertools import chain, repeat

@@ -182,11 +166,11 @@ def zarr_writer(func, store, k, elem, dataset_kwargs, iospec):
zarr_write_keys.append(k)
func(store, k, elem, dataset_kwargs=dataset_kwargs)

def h5ad_reader(func, elem_name: str, elem, dataset_kwargs, iospec):
def h5ad_reader(func, elem_name: str, elem, iospec):
h5ad_read_keys.append(elem_name)
return func(elem)

def zarr_reader(func, elem_name: str, elem, dataset_kwargs, iospec):
def zarr_reader(func, elem_name: str, elem, iospec):
zarr_read_keys.append(elem_name)
return func(elem)