anndata - Annotated data
anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.
Discuss development on GitHub.
Read the documentation.
Ask questions on the scverse Discourse.
Install via
pip install anndata
orconda install anndata -c conda-forge
.Consider citing the anndata paper.
See Scanpy’s documentation for usage related to single cell data. anndata was initially built for Scanpy.
News
Muon paper published 2022-02-02
Muon has been published in Genome Biology [Bredikhin22].
Muon is a framework for multimodal data built on top of AnnData
.
COVID-19 datasets distributed as h5ad
2020-04-01
In a joint initiative, the Wellcome Sanger Institute, the Human Cell Atlas, and the CZI distribute datasets related to COVID-19 via anndata’s h5ad
files: covid19cellatlas.org.
Latest additions
Version 0.9
0.9.0 the future
Features
Unordered categorical columns are no longer cast to object during
anndata.concat()
#763 @ivirshupAdded support for dask arrays #813 @syelman @rahulbshrestha
obsm
,varm
anduns
can now hold AwkwardArrays #647 @giovp, @grst, @ivirshupBetter error messages during IO #734 @flying-sheep, @ivirshup
Documentation
New tutorial on using
dask.array
withAnnData
#886 @syelmanFile format description now includes a more formal specification #882 @ivirshup
Expanded docstring more documentation for
backed
argument ofanndata.read_h5ad()
#812 @jeskowagnerDocumented how to use alternative compression methods for the
h5ad
file format, seeAnnData.write_h5ad()
#857 @nigeil
Breaking changes
Bug fixes
Updates
Deprecations
AnnData.concatenate()
is now deprecated in favour ofanndata.concat()
#845 @ivirshupPreviously deprecated
force_dense
arugmentAnnData.write_h5ad()
has been removed. #855 @ivirshup
Version 0.8
0.8.1 the future
Bug fixes
Fix warning from
rename_categories
#790 I VirshupRemove backwards compat checks for categories in
uns
when we can tell the file is new enough #790 I VirshupCategorical arrays are now created with a python
bool
instead of anumpy.bool_
#856
Documentation
0.8.0 14th March, 2022
IO Specification
Warning
The on disk format of AnnData objects has been updated with this release.
Previous releases of anndata
will not be able to read all files written by this version.
For discussion of possible future solutions to this issue, see #698
Internal handling of IO has been overhauled.
This should make it much easier to support new datatypes, use partial access, and use AnnData
internally in other formats.
Each element should be tagged with an
encoding_type
andencoding_version
. See updated docs on the file formatSupport for nullable integer and boolean data arrays. More data types to come!
Experimental support for low level access to the IO API via
read_elem()
andwrite_elem()
Features
Added PyTorch dataloader
AnnLoader
and lazy concatenation objectAnnCollection
. See the tutorials #416 S RybakovCompatibility with
h5ad
files written from Julia #569 I KatsMany logging messages that should have been warnings are now warnings #650 I Virshup
Significantly more efficient
anndata.read_umi_tools()
#661 I VirshupFixed deepcopy of a copy of a view retaining sparse matrix view mixin type #670 M Klein
In many cases
X
can now beNone
#463 R Cannoodt #677 I Virshup. Remaining work is documented in #467.Removed hard
xlrd
dependency I Virshupobs
andvar
dataframes are no longer copied by default onAnnData
instantiation #371 I Virshup
Bug fixes
Fixed issue where
.copy
was creating sparse matrices views when copying #670 michalk8Fixed issue where
.X
matrix read in fromzarr
would always havefloat32
values #701 I VirshupRaw.to_adata`
now includesobsp
in the output #404 G Eraslan
Dependencies
xlrd
dropped as a hard dependencyNow requires
h5py
v3.0.0
or newer