anndata.experimental.read_lazy

anndata.experimental.read_lazy#

anndata.experimental.read_lazy(store, *, load_annotation_index=True)[source]#

Lazily read in on-disk/in-cloud AnnData stores, including obs and var. No array data should need to be read into memory with the exception of ak.Array, scalars, and some older-encoding arrays.

Parameters:

store PathLike[str] | str | MutableMapping | Group | File | Group: A store-like object to be read in. If zarr.Group, it is best for it to be consolidated. If a path to an .h5ad file is provided, the open HDF5 file will be attached to the {class}`~anndata.AnnData` at the file attribute and it will be the user’s responsibility to close it when done with the returned object. For this reason, it is recommended to use an {class}`h5py.File` as the store argument when working with h5 files. It must remain open for at least as long as this returned object is in use.
load_annotation_index bool (default: True): Whether or not to use a range index for the {obs,var} xarray.Dataset so as not to load the index into memory. If False, the real index will be inserted as {obs,var}_names in the object but not be one of the coords thereby preventing read operations. Access to adata.obs.index will also only give the dummy index, and not the “real” index that is file-backed.

Return type:

AnnData

Returns:

A lazily read-in AnnData object.

Examples

Preparing example objects

>>> import anndata as ad
>>> import pooch
>>> import scanpy as sc
>>> base_url = "https://datasets.cellxgene.cziscience.com"
>>> # To update hashes: pooch.retrieve(url, known_hash=None) prints the new hash
>>> def get_cellxgene_data(id_: str, hash_: str):
...     return pooch.retrieve(
...         f"{base_url}/{id_}.h5ad",
...         known_hash=hash_,
...         fname=f"{id_}.h5ad",
...         path=sc.settings.datasetdir,
...     )
>>> path_b_cells = get_cellxgene_data(
...     "a93eab58-3d82-4b61-8a2f-d7666dcdb7c4",
...     "sha256:dac90fe2aa8b78aee2c1fc963104592f8eff7b873ca21d01a51a5e416734651c",
... )
>>> path_fetal = get_cellxgene_data(
...     "d170ff04-6da0-4156-a719-f8e1bbefbf53",
...     "sha256:d497eebca03533919877b6fc876e8c9d8ba063199ddc86dd9fbcb9d1d87a3622",
... )
>>> b_cells_adata = ad.experimental.read_lazy(path_b_cells)
>>> fetal_adata = ad.experimental.read_lazy(path_fetal)
>>> print(b_cells_adata)
AnnData object with n_obs × n_vars = 146 × 33452
    obs: 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', ...
>>> print(fetal_adata)
AnnData object with n_obs × n_vars = 344 × 15585
    obs: 'nCount_Spatial', 'nFeature_Spatial', 'Cluster', 'adult_pred_type'...

This functionality is compatible with anndata.concat()

>>> ad.concat([b_cells_adata, fetal_adata], join="outer")
AnnData object with n_obs × n_vars = 490 × 33452
    obs: 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id'...

anndata.experimental.read_lazy

Contents

anndata.experimental.read_lazy#