anndata.AnnData

class anndata.AnnData(X=None, obs=None, var=None, uns=None, obsm=None, varm=None, layers=None, raw=None, dtype='float32', shape=None, filename=None, filemode=None, asview=False, *, oidx=None, vidx=None)

An annotated data matrix.

AnnData stores a data matrix .X together with annotations of observations .obs, variables .var and unstructured annotations .uns.

https://falexwolf.de/img/scanpy/anndata.svg

An AnnData object adata can be sliced like a pandas dataframe, for instance, adata_subset = adata[:, list_of_variable_names]. AnnData’s basic structure is similar to R’s ExpressionSet [Huber15]. If setting an .h5ad-formatted HDF5 backing file .filename, data remains on the disk but is automatically loaded into memory if needed. See this blog post for more details.

Parameters:
X : Union[ndarray, spmatrix, DataFrame, None]

A #observations × #variables data matrix. A view of the data is used if the data type matches, otherwise, a copy is made.

obs : Union[DataFrame, Mapping[Any, Iterable[Any]], ndarray, None]

Key-indexed one-dimensional observations annotation of length #observations.

var : Union[DataFrame, Mapping[Any, Iterable[Any]], ndarray, None]

Key-indexed one-dimensional variables annotation of length #variables.

uns : Optional[Mapping[Any, Any]]

Key-index unstructured annotation.

obsm : Union[ndarray, Mapping[str, Sequence[Any]], None]

Key-indexed multi-dimensional observations annotation of length #observations.

varm : Union[ndarray, Mapping[str, Sequence[Any]], None]

Key-indexed multi-dimensional variables annotation of length #observations.

dtype : Union[dtype, str]

Data type used for storage.

shape : Optional[tuple]

Shape tuple (#observations, #variables). Can only be provided if X is None.

filename : Optional[PathLike]

Name of backing file. See anndata.h5py.File.

filemode : Optional[str]

Open mode of backing file. See anndata.h5py.File.

layers : Optional[Mapping[~KT, +VT_co]]

Dictionary with keys as layers’ names and values as matrices of the same dimensions as X.

Notes

Multi-dimensional annotations are stored in .obsm and .varm.

Indexing into an AnnData object with a numeric is supposed to be positional, like pandas .iloc method, while indexing with a string/ categorical is supposed to behave like .loc.

If the unstructured annotations .uns contain a sparse matrix of shape .n_obs × .n_obs, these are sliced when calling [].

A data matrix is flattened if either n_obs or n_vars is 1, so that numpy’s slicing behavior is reproduced:

adata = AnnData(np.ones((2, 2)))
adata[:, 0].X == adata.X[:, 0]

AnnData stores observations (samples) of variables (features) in the rows of a matrix. This is the convention of the modern classics of statistics [Hastie09] and machine learning [Murphy12], the convention of dataframes both in R and Python and the established statistics and machine learning packages in Python (statsmodels, scikit-learn).

Attributes

T Transpose whole object.
X Data matrix of shape n_obs × n_vars.
filename Change to backing mode by setting the filename of a .h5ad file.
isbacked True if object is backed on disk, False otherwise.
isview True if object is view of another AnnData object, False otherwise.
layers Dictionary-like object with values of the same dimensions as .X.
n_obs Number of observations.
n_vars Number of variables/features.
obs One-dimensional annotation of observations (pd.DataFrame).
obs_names Names of observations (alias for .obs.index).
obsm Multi-dimensional annotation of observations (mutable structured np.ndarray).
raw Store raw version of .X and .var as .raw.X and .raw.var.
shape Shape of data matrix – (n_obs, n_vars).
uns Unstructured annotation (ordered dictionary).
var One-dimensional annotation of variables/ features (pd.DataFrame).
var_names Names of variables (alias for .var.index).
varm Multi-dimensional annotation of variables/ features (mutable structured np.ndarray).

Methods

chunk_X([select, replace]) Return a chunk of the data matrix .X with random or specified indices.
chunked_X([chunk_size]) Return an iterator over the rows of the data matrix .X.
concatenate(*adatas[, join, batch_key, …]) Concatenate along the observations axis.
copy([filename]) Full copy, optionally on disk.
obs_keys() List keys of observation annotation .obs.
obs_names_make_unique([join]) Makes the index unique by appending ‘1’, ‘2’, etc.
obsm_keys() List keys of observation annotation obsm.
rename_categories(key, categories) Rename categories of annotation key in .obs, .var and .uns.
transpose() Transpose whole object.
uns_keys() List keys of unstructured annotation.
var_keys() List keys of variable annotation var.
var_names_make_unique([join]) Makes the index unique by appending ‘1’, ‘2’, etc.
varm_keys() List keys of variable annotation varm.
write([filename, compression, …]) Write .h5ad-formatted hdf5 file and close a potential backing file.
write_csvs(dirname[, skip_data, sep]) Write annotation to .csv files.
write_loom(filename) Write .loom-formatted hdf5 file.
write_zarr(store, chunks)