anndata.experimental.read_elem_as_dask

anndata.experimental.read_elem_as_dask#

anndata.experimental.read_elem_as_dask(elem, chunks=None)[source]#

Read an element from a store lazily.

Assumes that the element is encoded using the anndata encoding. This function will determine the encoded type using the encoding metadata stored in elem’s attributes.

Parameters:
elem Array | Dataset | Group | Group

The stored element.

chunks tuple[int, ...] | None (default: None)

length n, the same n as the size of the underlying array. Note that the minor axis dimension must match the shape for sparse. Defaults to (1000, adata.shape[1]) for CSR sparse, (adata.shape[0], 1000) for CSC sparse, and the on-disk chunking otherwise for dense. Can use -1 or None to indicate use of the size of the corresponding dimension.

optional

length n, the same n as the size of the underlying array. Note that the minor axis dimension must match the shape for sparse. Defaults to (1000, adata.shape[1]) for CSR sparse, (adata.shape[0], 1000) for CSC sparse, and the on-disk chunking otherwise for dense. Can use -1 or None to indicate use of the size of the corresponding dimension.

Return type:

DaskArray

Returns:

DaskArray

Examples

Setting up our example:

>>> from scanpy.datasets import pbmc3k
>>> import tempfile
>>> import anndata as ad
>>> import zarr
>>> tmp_path = tempfile.gettempdir()
>>> zarr_path = tmp_path + "/adata.zarr"
>>> adata = pbmc3k()
>>> adata.layers["dense"] = adata.X.toarray()
>>> adata.write_zarr(zarr_path)

Reading a sparse matrix from a zarr store lazily, with custom chunk size and default:

>>> g = zarr.open(zarr_path)
>>> adata.X = ad.experimental.read_elem_as_dask(g["X"])
>>> adata.X
dask.array<make_dask_chunk, shape=(2700, 32738), dtype=float32, chunksize=(1000, 32738), chunktype=scipy.csr_matrix>
>>> adata.X = ad.experimental.read_elem_as_dask(
...     g["X"], chunks=(500, adata.shape[1])
... )
>>> adata.X
dask.array<make_dask_chunk, shape=(2700, 32738), dtype=float32, chunksize=(500, 32738), chunktype=scipy.csr_matrix>

Reading a dense matrix from a zarr store lazily:

>>> adata.layers["dense"] = ad.experimental.read_elem_as_dask(g["layers/dense"])
>>> adata.layers["dense"]
dask.array<from-zarr, shape=(2700, 32738), dtype=float32, chunksize=(169, 2047), chunktype=numpy.ndarray>

Making a new anndata object from on-disk, with custom chunks:

>>> adata = ad.AnnData(
...     obs=ad.io.read_elem(g["obs"]),
...     var=ad.io.read_elem(g["var"]),
...     uns=ad.io.read_elem(g["uns"]),
...     obsm=ad.io.read_elem(g["obsm"]),
...     varm=ad.io.read_elem(g["varm"]),
... )
>>> adata.X = ad.experimental.read_elem_as_dask(
...     g["X"], chunks=(500, adata.shape[1])
... )
>>> adata.layers["dense"] = ad.experimental.read_elem_as_dask(g["layers/dense"])

We also support using -1 and None as a chunk size to signify the reading the whole axis:

>>> adata.X = ad.experimental.read_elem_as_dask(g["X"], chunks=(500, -1))
>>> adata.X = ad.experimental.read_elem_as_dask(g["X"], chunks=(500, None))