anndata.experimental.read_lazy#
- anndata.experimental.read_lazy(store, *, load_annotation_index=True)[source]#
Lazily read in on-disk/in-cloud AnnData stores, including
obsandvar. No array data should need to be read into memory with the exception ofak.Array, scalars, and some older-encoding arrays.- Parameters:
- store
PathLike[str] |str|MutableMapping|Group|Dataset A store-like object to be read in. If
zarr.Group, it is best for it to be consolidated.- load_annotation_index
bool(default:True) Whether or not to use a range index for the
{obs,var}xarray.Datasetso as not to load the index into memory. IfFalse, the realindexwill be inserted as{obs,var}_namesin the object but not be one of thecoordsthereby preventing read operations. Access toadata.obs.indexwill also only give the dummy index, and not the “real” index that is file-backed.
- store
- Return type:
- Returns:
A lazily read-in
AnnDataobject.
Examples
Preparing example objects
>>> import anndata as ad >>> from urllib.request import urlretrieve >>> import scanpy as sc >>> base_url = "https://datasets.cellxgene.cziscience.com" >>> def get_cellxgene_data(id_: str): ... out_path = sc.settings.datasetdir / f"{id_}.h5ad" ... if out_path.exists(): ... return out_path ... file_url = f"{base_url}/{id_}.h5ad" ... sc.settings.datasetdir.mkdir(parents=True, exist_ok=True) ... urlretrieve(file_url, out_path) ... return out_path >>> path_b_cells = get_cellxgene_data("a93eab58-3d82-4b61-8a2f-d7666dcdb7c4") >>> path_fetal = get_cellxgene_data("d170ff04-6da0-4156-a719-f8e1bbefbf53") >>> b_cells_adata = ad.experimental.read_lazy(path_b_cells) >>> fetal_adata = ad.experimental.read_lazy(path_fetal) >>> print(b_cells_adata) AnnData object with n_obs × n_vars = 146 × 33452 obs: 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id', ... >>> print(fetal_adata) AnnData object with n_obs × n_vars = 344 × 15585 obs: 'nCount_Spatial', 'nFeature_Spatial', 'Cluster', 'adult_pred_type'...
This functionality is compatible with
anndata.concat()>>> ad.concat([b_cells_adata, fetal_adata], join="outer") AnnData object with n_obs × n_vars = 490 × 33452 obs: 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'organism_ontology_term_id'...