anndata.experimental.read_lazy

Contents

anndata.experimental.read_lazy#

anndata.experimental.read_lazy(store, *, load_annotation_index=True)[source]#

Lazily read in on-disk/in-cloud AnnData stores, including obs and var. No array data should need to be read into memory with the exception of ak.Array, scalars, and some older-encoding arrays.

Parameters:
store str | Path | MutableMapping | Group | Dataset

A store-like object to be read in. If zarr.Group, it is best for it to be consolidated.

load_annotation_index bool (default: True)

Whether or not to use a range index for the {obs,var} xarray.Dataset so as not to load the index into memory. If False, the real index will be inserted as {obs,var}_names in the object but not be one of the coords thereby preventing read operations. Access to adata.obs.index will also only give the dummy index, and not the “real” index that is file-backed.

Return type:

AnnData

Returns:

A lazily read-in AnnData object.

Examples

Preparing example objects

>>> import anndata as ad
>>> from urllib.request import urlretrieve
>>> import scanpy as sc
>>> base_url = "https://datasets.cellxgene.cziscience.com"
>>> def get_cellxgene_data(id_: str):
...     out_path = sc.settings.datasetdir / f"{id_}.h5ad"
...     if out_path.exists():
...         return out_path
...     file_url = f"{base_url}/{id_}.h5ad"
...     sc.settings.datasetdir.mkdir(parents=True, exist_ok=True)
...     urlretrieve(file_url, out_path)
...     return out_path
>>> path_b_cells = get_cellxgene_data("a93eab58-3d82-4b61-8a2f-d7666dcdb7c4")
>>> path_fetal = get_cellxgene_data("d170ff04-6da0-4156-a719-f8e1bbefbf53")
>>> b_cells_adata = ad.experimental.read_lazy(path_b_cells)
>>> fetal_adata = ad.experimental.read_lazy(path_fetal)
>>> print(b_cells_adata)
AnnData object with n_obs × n_vars = 146 × 33452
    obs: 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', ...
>>> print(fetal_adata)
AnnData object with n_obs × n_vars = 344 × 15585
    obs: 'nCount_Spatial', 'nFeature_Spatial', 'Cluster', 'adult_pred_type'...

This functionality is compatible with anndata.concat()

>>> ad.concat([b_cells_adata, fetal_adata], join="outer")
AnnData object with n_obs × n_vars = 490 × 33452
    obs: 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id'...