Build Status Conda Coverage Docs PyPI Downloads Downloads Stars Powered by NumFOCUS

image

anndata - Annotated data#

anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.

anndata is part of the scverse project (website, governance) and is fiscally sponsored by NumFOCUS. Please consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.

Citation#

If you use anndata in your work, please cite the anndata pre-print as follows:

anndata: Annotated data

Isaac Virshup, Sergei Rybakov, Fabian J. Theis, Philipp Angerer, F. Alexander Wolf

bioRxiv 2021 Dec 19. doi: 10.1101/2021.12.16.473007.

You can cite the scverse publication as follows:

The scverse project provides a computational ecosystem for single-cell omics data analysis

Isaac Virshup, Danila Bredikhin, Lukas Heumos, Giovanni Palla, Gregor Sturm, Adam Gayoso, Ilia Kats, Mikaela Koutrouli, Scverse Community, Bonnie Berger, Dana Pe’er, Aviv Regev, Sarah A. Teichmann, Francesca Finotello, F. Alexander Wolf, Nir Yosef, Oliver Stegle & Fabian J. Theis

Nat Biotechnol. 2023 Apr 10. doi: 10.1038/s41587-023-01733-8.

Latest additions#

Version 0.11#

0.11.0 the future#

Features

  • Add settings object with methods for altering internally-used options, like checking for uniqueness on obs’ index #1270 @ilan-gold

  • Add remove_unused_categories option to anndata.settings to override current behavior. Default is True (i.e., previous behavior). Please refer to the documentation for usage. #1340 @ilan-gold

Bugfix

Documentation

Performance

Breaking

  • Removed deprecated modules anndata.core and anndata.readwrite #1197 @ivirshup

Version 0.10#

0.10.7 the future#

Bugfix

  • Handle upstream numcodecs bug where read-only string arrays cannot be encoded @ivirshup #1421

Documentation

Performance

0.10.6 2024-03-11#

Bugfix

  • Defer import of zarr in test helpers, as scanpy CI job relies on them #1343 @ilan-gold

  • Writing a dataframe with non-unique column names now throws an error, instead of silently overwriting #1335 @ivirshup

  • Bring optimization from #1233 to indexing on the whole AnnData object, not just the sparse dataset itself #1365 @ilan-gold

  • Fix mean slice length checking to use improved performance when indexing backed sparse matrices with boolean masks along their major axis #1366 @ilan-gold

  • Fixed overflow occurring when writing dask arrays with sparse chunks by always writing dask arrays with 64 bit indptr and indices, and adding an overflow check to .append method of sparse on disk structures #1348 @ivirshup

  • Modified ValueError message for invalid .X during construction to show more helpful list instead of ambiguous __name__ #1395 @eroell

  • Pin array-api-compat!=1.5 to avoid incorrect implementation of asarray #1411 @ivirshup

Documentation

  • Type hints and docstrings for .to_df method are updated and fixed #1402 @WeilerP

Development

  • anndata’s CI now tests against minimum versions of it’s dependencies. As a result, several dependencies had their minimum required version bumped. See diff for details #1314 @ivirshup

  • anndata now tests against Python 3.12 #1373 @ivirshup

0.10.5 2024-01-25#

Bugfix

  • Fix outer concatenation along variables when only a subset of objects had an entry in layers #1291 @ivirshup

  • Fix comparison of >2d arrays in uns during concatenation #1300 @ivirshup

  • Fix IO with awkward array version 2.5.2 #1328 @ivirshup

  • Fix bug (introduced in 0.10.4) where indexing an AnnData with list[bool] would return the wrong result #1332 @ivirshup

Documentation

Performance

  • BaseCompressedSparseDataset’s indptr is cached #1266 @ilan-gold

  • Improved performance when indexing backed sparse matrices with boolean masks along their major axis #1233 @ilan-gold

0.10.4 2024-01-04#

Bugfix

  • Only try to use Categorical.map(na_action=…) in actually supported Pandas ≥2.1 #1226 @flying-sheep

  • AnnData.__sizeof__() support for backed datasets #1230 @Neah-Ko

  • adata[:, []] now returns an AnnData object empty on the appropriate dimensions instead of erroring #1243 @ilan-gold

  • adata.X[mask] works in newer numpy versions when X is backed #1255 @ilan-gold

  • adata.X[...] fixed for X as a BaseCompressedSparseDataset with zarr backend #1265 @ilan-gold

  • Improve read/write error reporting #1273 @flying-sheep

Documentation

0.10.3 2023-10-31#

Bugfix

  • Prevent pandas from causing infinite recursion when setting a slice of a categorical column #1211 @flying-sheep

Documentation

  • Stop showing “Support for Awkward Arrays is currently experimental” warnings when reading, concatenating, slicing, or transposing AnnData objects #1182 @flying-sheep

Other updates

0.10.2 2023-10-11#

Bugfix

Performance

0.10.1 2023-10-08#

Bugfix

  • Fix ad.concat erroring when concatenating a categorical and object column #1171 @ivirshup

0.10.0 2023-10-06#

Features

GPU Support

Out of core

Improved errors and warnings

  • Improved error messages when combining dataframes with duplicated column names #1029 @ivirshup

  • Improved warnings when modifying views of AlingedMappings #1016 @flying-sheep @ivirshup

  • AnnDataReadErrors have been removed. The original error is now thrown with additional information in a note #1055 @ivirshup

Documentation

Breaking changes

Other updates

Deprecations

Bug fixes

See Release notes for more.

News#

Muon paper published 2022-02-02#

Muon has been published in Genome Biology [^cite_bredikhin22]. Muon is a framework for multimodal data built on top of AnnData.

Check out Muon and its datastructure MuData.

COVID-19 datasets distributed as h5ad 2020-04-01#

In a joint initiative, the Wellcome Sanger Institute, the Human Cell Atlas, and the CZI distribute datasets related to COVID-19 via anndata’s h5ad files: covid19cellatlas.org.