PyPI Docs Build Status Coverage

anndata - Annotated Data

Install via pip install anndata or conda install anndata -c bioconda.

Report issues and see the code on GitHub.

anndata is for simple (functional) high-level APIs for data analysis pipelines. In this context, it provides an efficient, scalable way of keeping track of data together with learned annotations and reduces the code overhead typically encountered when using a mostly object-oriented library such as scikit-learn.

The prime use is currently for Scanpy, for which anndata was initially developed. Both packages have been introduced in Genome Biology (2018).

See all releases here. The following lists selected improvements.

May 1, 2018: version 0.6

  1. compatibility with Seurat converter
  2. tremendous speedup for concatenate()

April 17, 2018: versions 0.5.9 - 0.5.10

  1. bug fix for deep copy of unstructured annotation after slicing

March 16, 2018: versions 0.5.1 - 0.5.8

  1. bug fix for reading HDF5 stored single-category annotations
  2. ‘outer join’ concatenation: adds zeros for concatenation of sparse data and nans for dense data
  3. better memory efficiency in loom exports
  4. consistency and documentation updates
  5. prettified print output


There was a bug in concatenate() in versions 0.5.2, 0.5.3 and 0.5.4: variable names were not assigned correctly. Was fixed in version 0.5.5.

February 9, 2018: version 0.5

  1. inform about duplicates in var_names and resolve them using var_names_make_unique()
  2. automatically remove unused categories after slicing
  3. read/write .loom files using loompy 2
  4. some IDE-backed improvements

December 29, 2017: version 0.4.2

  1. fixed read/write for a few text file formats
  2. read UMI tools files: read_umi_tools()

December 23, 2017: version 0.4

  1. towards a common file format for exchanging AnnData with packages such as Seurat and SCDE by reading and writing .loom files
  2. AnnData provides scalability beyond dataset sizes that fit into memory: see this blog post
  3. AnnData has a raw attribute that simplifies storing the data matrix when you consider it “raw”: see the clustering tutorial