anndata - Annotated data
anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.
Discuss development on GitHub.
Read the documentation.
Ask questions on the scverse Discourse.
Install via
pip install anndata
orconda install anndata -c conda-forge
.See Scanpy’s documentation for usage related to single cell data. anndata was initially built for Scanpy.
If you use anndata
in your work, please cite the anndata
pre-print as follows:
anndata: Annotated data
Isaac Virshup, Sergei Rybakov, Fabian J. Theis, Philipp Angerer, F. Alexander Wolf
bioRxiv 2021 Dec 19. doi: 10.1101/2021.12.16.473007.
You can cite the scverse publication as follows:
The scverse project provides a computational ecosystem for single-cell omics data analysis
Isaac Virshup, Danila Bredikhin, Lukas Heumos, Giovanni Palla, Gregor Sturm, Adam Gayoso, Ilia Kats, Mikaela Koutrouli, Scverse Community, Bonnie Berger, Dana Pe’er, Aviv Regev, Sarah A. Teichmann, Francesca Finotello, F. Alexander Wolf, Nir Yosef, Oliver Stegle & Fabian J. Theis
Nat Biotechnol. 2022 Apr 10. doi: 10.1038/s41587-023-01733-8.
News
Muon paper published 2022-02-02
Muon has been published in Genome Biology [^cite_bredikhin22].
Muon is a framework for multimodal data built on top of AnnData
.
COVID-19 datasets distributed as h5ad
2020-04-01
In a joint initiative, the Wellcome Sanger Institute, the Human Cell Atlas, and the CZI distribute datasets related to COVID-19 via anndata’s h5ad
files: covid19cellatlas.org.
Latest additions
Version 0.9
0.9.1 2023-04-11
Bugfix
0.9.0 2023-04-11
Features
Added experimental support for dask arrays #813 @syelman @rahulbshrestha
obsm
,varm
anduns
can now hold AwkwardArrays #647 @giovp, @grst, @ivirshupAdded experimental functions
anndata.experimental.read_dispatched()
andanndata.experimental.write_dispatched()
which allow customizing IO with a callback #873 @ilan-gold @ivirshupBetter error messages during IO #734 @flying-sheep, @ivirshup
Unordered categorical columns are no longer cast to object during
anndata.concat()
#763 @ivirshup
Documentation
New tutorials for experimental features
File format description now includes a more formal specification #882 @ivirshup
Interoperability: new page on interoperability with other packages #831 @ivirshup
Expanded docstring more documentation for
backed
argument ofanndata.read_h5ad()
#812 @jeskowagnerDocumented how to use alternative compression methods for the
h5ad
file format, seeAnnData.write_h5ad()
#857 @nigeil
Breaking changes
The
AnnData
dtype
argument no longer defaults tofloat32
#854 @ivirshupPreviously deprecated
force_dense
arugmentAnnData.write_h5ad()
has been removed. #855 @ivirshupPreviously deprecated behaviour around storing adjacency matrices in
uns
has been removed #866 @ivirshup
Other updates
Deprecations
AnnData.concatenate()
is now deprecated in favour ofanndata.concat()
#845 @ivirshup
Bug fixes
Fixed order dependent outer concatenation bug #904 @ivirshup, reported by @szalata
Fixed bug in renaming categories #790 @ivirshup, reported by @perrin-isir
Fixed IO bug when keys in
uns
ended in_categories
#806 @ivirshup, reported by @HrovatinFixed
raw.to_adata
not populatingobs
aligned values whenraw
was assigned through the setter #939 @ivirshup
Version 0.8
0.8.1 the future
Bug fixes
Fix warning from
rename_categories
#790 I VirshupRemove backwards compat checks for categories in
uns
when we can tell the file is new enough #790 I VirshupCategorical arrays are now created with a python
bool
instead of anumpy.bool_
#856
Documentation
0.8.0 14th March, 2022
IO Specification
Warning
The on disk format of AnnData objects has been updated with this release.
Previous releases of anndata
will not be able to read all files written by this version.
For discussion of possible future solutions to this issue, see #698
Internal handling of IO has been overhauled.
This should make it much easier to support new datatypes, use partial access, and use AnnData
internally in other formats.
Each element should be tagged with an
encoding_type
andencoding_version
. See updated docs on the file formatSupport for nullable integer and boolean data arrays. More data types to come!
Experimental support for low level access to the IO API via
read_elem()
andwrite_elem()
Features
Added PyTorch dataloader
AnnLoader
and lazy concatenation objectAnnCollection
. See the tutorials #416 S RybakovCompatibility with
h5ad
files written from Julia #569 I KatsMany logging messages that should have been warnings are now warnings #650 I Virshup
Significantly more efficient
anndata.read_umi_tools()
#661 I VirshupFixed deepcopy of a copy of a view retaining sparse matrix view mixin type #670 M Klein
In many cases
X
can now beNone
#463 R Cannoodt #677 I Virshup. Remaining work is documented in #467.Removed hard
xlrd
dependency I Virshupobs
andvar
dataframes are no longer copied by default onAnnData
instantiation #371 I Virshup
Bug fixes
Fixed issue where
.copy
was creating sparse matrices views when copying #670 michalk8Fixed issue where
.X
matrix read in fromzarr
would always havefloat32
values #701 I VirshupRaw.to_adata`
now includesobsp
in the output #404 G Eraslan
Dependencies
xlrd
dropped as a hard dependencyNow requires
h5py
v3.0.0
or newer