Release notes#
Version 0.11#
0.11.0rc2.dev10+g34e9783 2024-09-17#
Development Process#
Add extra
dask
dependency for installation i.e.,pip install anndata[dask]
@ilan-gold (#1677)
0.11.0rc1 2024-09-04#
Breaking changes#
Removed deprecated modules
anndata.core
andanndata.readwrite
@ivirshup (#1197)No longer export
sparse_dataset
fromanndata.experimental
, instead exportinganndata.sparse_dataset()
@ilan-gold (#1642)Move
RWAble
andInMemoryElem
out ofexperimental
, renamingRWAble
toAxisStorable
andInMemoryElem
toRWAble
@ilan-gold (#1643)
Development Process#
Documentation#
Correct
anndata.AnnData.X
type to includeCSRDataset
andCSCDataset
as possible types and being deprecation process for non-csr/cscscipy.sparse.spmatrix
types inanndata.AnnData.X
@ilan-gold (#1616)
Features#
scipy.sparse.csr_array
andscipy.sparse.csc_array
are now supported when constructingAnnData
objects @ilan-gold @isaac-virshup (#1028)Allow
axis
parameter of e.g. :func:anndata.concat
to accept'obs'
and'var'
@flying-sheep (#1244)Add
settings
object with methods for altering internally-used options, like checking for uniqueness onobs
’ index @ilan-gold (#1270)Add
shall_remove_unused_categories
option toanndata.settings
to override current behavior @ilan-gold (#1340)Add :func:
~anndata.experimental.read_elem_as_dask
function to handle i/o with sparse and dense arrays @ilan-gold (#1469)Add ability to convert strings to categoricals on write in
write_h5ad()
andwrite_zarr()
viaconvert_strings_to_categoricals
parameter @falexwolf (#1474)Add
shall_check_uniqueness
option toanndata.settings
to override current behavior @ilan-gold (#1507)Add functionality to write from GPU
dask.array.Array
to disk @ilan-gold (#1550)Read and write support for nullable string arrays (
pandas.arrays.StringArray
). Use pandas’ Options and settingsmode.string_storage
to control which storage mode is used when readingdtype="string"
columns. @flying-sheep (#1558)Export
write_elem()
andread_elem()
directly from the main package instead ofexperimental
@ilan-gold (#1598)Allow reading sparse data (via
read_elem()
orsparse_dataset()
) into eitherscipy.sparse.csr_array
orscipy.sparse.csc_array
viaanndata.settings.shall_use_sparse_array_on_read
@ilan-gold (#1633)
Version 0.10#
0.10.9 2024-08-28#
Bug fixes#
Fix writing large number of columns for
h5
files @ilan-gold @selmanozleyen (#1147)Add warning for setting
X
on a view with repeated indices @ilan-gold (#1501)Coerce
numpy.matrix
classes to arrays when trying to store them inAnnData
@flying-sheep (#1516)Fix for setting a dense
X
view with a sparse matrix @ilan-gold (#1532)Upper bound
numpy
forgpu
installation on account of cupy/cupy#8391 @ilan-gold (#1540)Upper bound dask on account of #1579 @ilan-gold (#1580)
Ensure setting
pandas.DataFrame.index
on a view of aAnnData
instantiates theDataFrame
from the view @ilan-gold (#1586)Disallow using
DataFrame
s with multi-index columns @ilan-gold (#1589)
Development Process#
Documentation#
add
callback
typing forread_dispatched()
andwrite_dispatched()
@ilan-gold (#1557)
Performance#
Support for
concat_on_disk
outer join @ilan-gold (#1504)
0.10.8 2024-06-20#
Bug fixes#
Write out
64bit
indptr when appropriate forconcat_on_disk()
#1493 @ilan-goldSupport for Numpy 2 #1499 @flying-sheep
Fix
sparse_dataset()
docstring test on account of newscipy
version #1514 @ilan-gold
Documentation#
Improved example for
sparse_dataset()
#1468 @ivirshup
0.10.7 2024-04-09#
Bug fixes#
Handle upstream
numcodecs
bug where read-only string arrays cannot be encoded @ivirshup #1421Use in-memory sparse matrix directly to fix compatibility with
scipy
1.13
@ilan-gold #1435
Performance#
Remove
vindex
for subsettingdask.array.Array
because of its slowness and memory consumption @ilan-gold #1432
0.10.6 2024-03-11#
Bug fixes#
Defer import of zarr in test helpers, as scanpy CI job relies on them #1343 @ilan-gold
Writing a dataframe with non-unique column names now throws an error, instead of silently overwriting #1335 @ivirshup
Bring optimization from #1233 to indexing on the whole
AnnData
object, not just the sparse dataset itself #1365 @ilan-goldFix mean slice length checking to use improved performance when indexing backed sparse matrices with boolean masks along their major axis #1366 @ilan-gold
Fixed overflow occurring when writing dask arrays with sparse chunks by always writing dask arrays with 64 bit indptr and indices, and adding an overflow check to
.append
method of sparse on disk structures #1348 @ivirshupModified
ValueError
message for invalid.X
during construction to show more helpful list instead of ambiguous__name__
#1395 @eroellPin
array-api-compat!=1.5
to avoid incorrect implementation ofasarray
#1411 @ivirshup
Documentation#
Development#
0.10.5 2024-01-25#
Bug fixes#
Fix outer concatenation along variables when only a subset of objects had an entry in layers #1291 @ivirshup
Fix comparison of >2d arrays in
uns
during concatenation #1300 @ivirshupFix bug (introduced in 0.10.4) where indexing an AnnData with
list[bool]
would return the wrong result #1332 @ivirshup
Documentation#
Re-add search-as-you-type, this time via
readthedocs-sphinx-search
#1311 @flying-sheep
Performance#
BaseCompressedSparseDataset
’sindptr
is cached #1266 @ilan-goldImproved performance when indexing backed sparse matrices with boolean masks along their major axis #1233 @ilan-gold
0.10.4 2024-01-04#
Bug fixes#
Only try to use
Categorical.map(na_action=…)
in actually supported Pandas ≥2.1 #1226 @flying-sheepAnnData.__sizeof__()
support for backed datasets #1230 @Neah-Koadata[:, []]
now returns anAnnData
object empty on the appropriate dimensions instead of erroring #1243 @ilan-goldadata.X[mask]
works in newernumpy
versions whenX
isbacked
#1255 @ilan-goldadata.X[...]
fixed forX
as aBaseCompressedSparseDataset
withzarr
backend #1265 @ilan-goldImprove read/write error reporting #1273 @flying-sheep
Documentation#
Improve aligned mapping error messages #1252 @flying-sheep
0.10.3 2023-10-31#
Bug fixes#
Prevent pandas from causing infinite recursion when setting a slice of a categorical column #1211 @flying-sheep
Documentation#
Stop showing “Support for Awkward Arrays is currently experimental” warnings when reading, concatenating, slicing, or transposing AnnData objects #1182 @flying-sheep
Other updates#
Fail canary CI job when tests raise unexpected warnings. #1182 @flying-sheep
0.10.2 2023-10-11#
Bug fixes#
Added compatibility layer for packages relying on
anndata._core.sparse_dataset.SparseDataset
. Note that this API is deprecated and new code should useCSRDataset
,CSCDataset
, andsparse_dataset()
instead. #1185 @ivirshupHandle deprecation warning from
pd.Categorical.map
thrown duringanndata.concat
#1189 @flying-sheep @ivirshupFixed extra steps being included in IO tracebacks #1193 @flying-sheep
as_dense
argument ofwrite_h5ad
no longer writes an array without encoding metadata #1193 @flying-sheep
Performance#
Improved performance of
concat_on_disk
with dense arrays in some cases #1169 @selmanozleyen
0.10.1 2023-10-08#
Bug fixes#
0.10.0 2023-10-06#
Features#
GPU Support
Dense and sparse
CuPy
arrays are now supported #1066 @ivirshupOnce you have
CuPy
arrays in your anndata, use it with:rapids-singlecell
from v0.9+
anndata now has GPU enabled CI. Made possibly by a grant from CZI’s EOSS program and managed via Cirun #1066 #1084 @Zethson @ivirshup
Out of core
Concatenate on-disk anndata objects with
anndata.experimental.concat_on_disk()
#955 @selmanozleyenAnnData can now hold dask arrays with
scipy.sparse.spmatrix
chunks #1114 @ivirshupPublic API for interacting with on disk sparse arrays:
sparse_dataset()
,CSRDataset
, andCSCDataset
#765 @ilan-gold @ivirshupImproved performance for simple slices of OOC sparse arrays #1131 @ivirshup
Improved errors and warnings
Improved error messages when combining dataframes with duplicated column names #1029 @ivirshup
Improved warnings when modifying views of
AlingedMappings
#1016 @flying-sheep @ivirshupAnnDataReadError
s have been removed. The original error is now thrown with additional information in a note #1055 @ivirshup
Documentation#
Added zarr examples to file format docs #1162 @ivirshup
Breaking changes#
anndata.AnnData.transpose()
no longer copies unnecessarily. If you rely on the copying behavior, call.copy
on the resulting object. #1114 @ivirshup
Other updates#
Bump minimum python version to 3.9 #1117 @flying-sheep
Deprecations#
Deprecate
anndata.read
, which was just an alias foranndata.read_h5ad()
#1108 @ivirshup.dtype
argument toAnnData
constructor is now deprecated #1153 @ivirshup
Bug fixes#
Fix shape inference on initialization when
X=None
is specified #1121 @flying-sheep
Version 0.9#
0.9.2 2023-07-25#
Bug fixes#
Views of
awkward.Array
s now work withawkward>=2.3
#1040 @ivirshupFix ufuncs of views like
adata.X[:10].cov(axis=0)
returning views #1043 @flying-sheepFix instantiating AnnData where
.X
is aDataFrame
with an integer valued index #1002 @flying-sheepFix
read_zarr()
when used onzarr.Group
#1057 @ivirshup
0.9.1 2023-04-11#
Bug fixes#
0.9.0 2023-04-11#
Features#
Added experimental support for dask arrays #813 @syelman @rahulbshrestha
obsm
,varm
anduns
can now hold AwkwardArrays #647 @giovp, @grst, @ivirshupAdded experimental functions
anndata.experimental.read_dispatched()
andanndata.experimental.write_dispatched()
which allow customizing IO with a callback #873 @ilan-gold @ivirshupBetter error messages during IO #734 @flying-sheep, @ivirshup
Unordered categorical columns are no longer cast to object during
anndata.concat()
#763 @ivirshup
Documentation#
New tutorials for experimental features
File format description now includes a more formal specification #882 @ivirshup
Interoperability: new page on interoperability with other packages #831 @ivirshup
Expanded docstring more documentation for
backed
argument ofanndata.read_h5ad()
#812 @jeskowagnerDocumented how to use alternative compression methods for the
h5ad
file format, seeAnnData.write_h5ad()
#857 @nigeil
Breaking changes#
Other updates#
Deprecations#
AnnData.concatenate()
is now deprecated in favour ofanndata.concat()
#845 @ivirshup
Bug fixes#
Fix warning from
rename_categories
#790 I VirshupRemove backwards compat checks for categories in
uns
when we can tell the file is new enough #790 I VirshupCategorical arrays are now created with a python
bool
instead of anumpy.bool_
#856Fixed order dependent outer concatenation bug #904 @ivirshup, reported by @szalata
Fixed bug in renaming categories #790 @ivirshup, reported by @perrin-isir
Fixed IO bug when keys in
uns
ended in_categories
#806 @ivirshup, reported by @HrovatinFixed
raw.to_adata
not populatingobs
aligned values whenraw
was assigned through the setter #939 @ivirshup
Version 0.8#
0.8.0 14th March, 2022#
IO Specification#
Warning
The on disk format of AnnData objects has been updated with this release.
Previous releases of anndata
will not be able to read all files written by this version.
For discussion of possible future solutions to this issue, see #698
Internal handling of IO has been overhauled.
This should make it much easier to support new datatypes, use partial access, and use AnnData
internally in other formats.
Each element should be tagged with an
encoding_type
andencoding_version
. See updated docs on the file formatSupport for nullable integer and boolean data arrays. More data types to come!
Experimental support for low level access to the IO API via
read_elem()
andwrite_elem()
Features#
Added PyTorch dataloader
AnnLoader
and lazy concatenation objectAnnCollection
. See the tutorials #416 S RybakovCompatibility with
h5ad
files written from Julia #569 I KatsMany logging messages that should have been warnings are now warnings #650 I Virshup
Significantly more efficient
anndata.read_umi_tools()
#661 I VirshupFixed deepcopy of a copy of a view retaining sparse matrix view mixin type #670 M Klein
In many cases
X
can now beNone
#463 R Cannoodt #677 I Virshup. Remaining work is documented in #467.Removed hard
xlrd
dependency I Virshupobs
andvar
dataframes are no longer copied by default onAnnData
instantiation #371 I Virshup
Bug fixes#
Dependencies#
xlrd
dropped as a hard dependencyNow requires
h5py
v3.0.0
or newer
Version 0.7#
0.7.8 9 November, 2021#
Bug fixes#
Re-include test helpers #641 I Virshup
0.7.7 9 November, 2021#
Bug fixes#
Fixed propagation of import error when importing
write_zarr
but not all dependencies are installed #579 R HilljeFixed issue with
.uns
sub-dictionaries being referenced by copies #576 I VirshupFixed out-of-bounds integer indices not raising
IndexError
#630 M KleinFixed backed
SparseDataset
indexing with scipy 1.7.2 #638 I Virshup
Development processes#
Use PEPs 621 (standardized project metadata), 631 (standardized dependencies), and 660 (standardized editable installs) #639 I Virshup
0.7.6 11 April, 2021#
Features#
Added
anndata.AnnData.to_memory()
for returning an in memory object from a backed one #470 #542 V Bergen I Virshupanndata.AnnData.write_loom()
now writesobs_names
andvar_names
using theIndex
’s.name
attribute, if set #538 I Virshup
Bug fixes#
Fixed bug where
np.str_
column names errored at write time #457 I VirshupFixed “value.index does not match parent’s axis 0/1 names” error triggered when a data frame is stored in obsm/varm after obs_names/var_names is updated #461 G Eraslan
Fixed
adata.write_csvs
whenadata
is a view #462 I VirshupFixed null values being converted to strings when strings are converted to categorical #529 I Virshup
Fixed handling of compression key word arguments #536 I Virshup
Fixed copying a backed
AnnData
from changing which file the original object points at #533 ilia-katsFixed a bug where calling
AnnData.concatenate
anAnnData
with no variables would error #537 I Virshup
Deprecations#
Passing positional arguments to
anndata.read_loom()
besides the path is now deprecated #538 I Virshupanndata.read_loom()
argumentsobsm_names
andvarm_names
are now deprecated in favour ofobsm_mapping
andvarm_mapping
#538 I Virshup
0.7.5 12 November, 2020#
Functionality#
Added ipython tab completion and a useful return from
.keys
toadata.uns
#415 I Virshup
Bug fixes#
0.7.4 10 July, 2020#
Concatenation overhaul #378 I Virshup#
New function
anndata.concat()
for concatenatingAnnData
objects along either observations or variablesNew documentation section: Concatenation
Functionality#
AnnData object created from dataframes with sparse values will have sparse
.X
#395 I Virshup
Bug fixes#
0.7.3 20 May, 2020#
Bug fixes#
Fixed bug where graphs used too much memory when copying #381 I Virshup
0.7.2 15 May, 2020#
Concatenation overhaul I Virshup#
Functionality#
obs_names_make_unique()
is now better at making values unique, and will warn if ambiguities arise #345 M Weidenobsp
is now preferred for storing pairwise relationships between observations. In practice, this means there will be deprecation warnings and reformatting applied to objects which stored connectivities underuns["neighbors"]
. Square matrices inuns
will no longer be sliced (use.{obs,var}p
instead). #337 I VirshupImplicitModificationWarning
is now exported #315 P AngererBetter support for
ndarray
subclasses stored inAnnData
objects #335 michalk8
Bug fixes#
Fixed inplace modification of
Index
objects by the make unique function #348 I VirshupPassing ambiguous keys to
obs_vector()
andvar_vector()
now throws errors #340 I VirshupFix instantiating
AnnData
objects fromDataFrame
#316 P AngererFixed indexing into
AnnData
objects with arrays likeadata[adata[:, gene].X > 0]
#332 I VirshupFixed type of version #315 P Angerer
0.7.0 22 January, 2020#
Warning
Breaking changes introduced between 0.6.22.post1
and 0.7
:
Elements of
AnnData
s don’t have their dimensionality reduced when the main object is subset. This is to maintain consistency when subsetting. See discussion in #145.Internal modules like
anndata.core
are private and their contents are not stable: See #174.The old deprecated attributes
.smp*
..add
and.data
have been removed.
View overhaul #164#
Indexing into a view no longer keeps a reference to intermediate view, see #62.
Views are now lazy. Elements of view of AnnData are not indexed until they’re accessed.
Indexing with scalars no longer reduces dimensionality of contained arrays, see #145.
All elements of AnnData should now follow the same rules about how they’re subset, see #145.
Can now index by observations and variables at the same time.
IO overhaul #167#
Reading and writing has been overhauled for simplification and speed.
Time and memory usage can be half of previous in typical use cases
Zarr backend now supports sparse arrays, and generally is closer to having the same features as HDF5.
Backed mode should see significant speed and memory improvements for access along compressed dimensions and IO. PR #241.
Categorical
s can now be ordered (PR #230) and written to disk with a large number of categories (PR #217).
Mapping attributes overhaul (obsm, varm, layers, …)#
New attributes
obsp
andvarp
have been added for two dimensional arrays where each axis corresponds to a single axis of the AnnData object. PR #207.These are intended to store values like cell-by-cell graphs, which are currently stored in
uns
.Sparse arrays are now allowed as values in all mapping attributes.
All mapping attributes now share an implementation and will have the same behaviour. PR #164.
Miscellaneous improvements#
Version 0.6#
0.6.0 1 May, 2018#
compatibility with Seurat converter
tremendous speedup for
concatenate()
bug fix for deep copy of unstructured annotation after slicing
bug fix for reading HDF5 stored single-category annotations
'outer join'
concatenation: adds zeros for concatenation of sparse data and nans for dense databetter memory efficiency in loom exports
Version 0.5#
0.5.0 9 February, 2018#
inform about duplicates in
var_names
and resolve them usingvar_names_make_unique()
automatically remove unused categories after slicing
read/write .loom files using loompy 2
fixed read/write for a few text file formats
read UMI tools files:
read_umi_tools()
Version 0.4#
0.4.0 23 December, 2017#
read/write .loom files
scalability beyond dataset sizes that fit into memory: see this blog post
AnnData
has araw
attribute, which simplifies storing the data matrix when you consider it raw: see the clustering tutorial