anndata.AnnData

class anndata.AnnData(X=None, obs=None, var=None, uns=None, obsm=None, varm=None, layers=None, raw=None, dtype='float32', shape=None, filename=None, filemode=None, asview=False, *, obsp=None, varp=None, oidx=None, vidx=None)

An annotated data matrix.

AnnData stores a data matrix X together with annotations of observations obs, variables var and unstructured annotations uns.

https://falexwolf.de/img/scanpy/anndata.svg

An AnnData object adata can be sliced like a pandas dataframe, for instance, adata_subset = adata[:, list_of_variable_names]. AnnData’s basic structure is similar to R’s ExpressionSet [Huber15]. If setting an .h5ad-formatted HDF5 backing file .filename, data remains on the disk but is automatically loaded into memory if needed. See this blog post for more details.

Parameters
X : ndarray, spmatrix, DataFrame, NoneUnion[ndarray, spmatrix, DataFrame, None] (default: None)

A #observations × #variables data matrix. A view of the data is used if the data type matches, otherwise, a copy is made.

obs : DataFrame, Mapping, NoneUnion[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)

Key-indexed one-dimensional observations annotation of length #observations.

var : DataFrame, Mapping, NoneUnion[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)

Key-indexed one-dimensional variables annotation of length #variables.

uns : Mapping, NoneOptional[Mapping[str, Any]] (default: None)

Key-indexed unstructured annotation.

obsm : ndarray, Mapping, NoneUnion[ndarray, Mapping[str, Sequence[Any]], None] (default: None)

Key-indexed multi-dimensional observations annotation of length #observations. If passing a ndarray, it needs to have a structured datatype.

varm : ndarray, Mapping, NoneUnion[ndarray, Mapping[str, Sequence[Any]], None] (default: None)

Key-indexed multi-dimensional variables annotation of length #variables. If passing a ndarray, it needs to have a structured datatype.

layers : Mapping, NoneOptional[Mapping[str, Union[ndarray, spmatrix]]] (default: None)

Key-indexed multi-dimensional arrays aligned to dimensions of X.

dtype : dtype, strUnion[dtype, str] (default: 'float32')

Data type used for storage.

shape : Tuple[int, int], NoneOptional[Tuple[int, int]] (default: None)

Shape tuple (#observations, #variables). Can only be provided if X is None.

filename : PathLike, NoneOptional[PathLike] (default: None)

Name of backing file. See anndata.h5py.File.

filemode : {‘r’, ‘r+’}, NoneOptional[Literal[‘r’, ‘r+’]] (default: None)

Open mode of backing file. See anndata.h5py.File.

Notes

AnnData stores observations (samples) of variables (features) in the rows of a matrix. This is the convention of the modern classics of statistics [Hastie09] and machine learning [Murphy12], the convention of dataframes both in R and Python and the established statistics and machine learning packages in Python (statsmodels, scikit-learn).

Single dimensional annotations of the observation and variables are stored in the obs and var attributes as DataFrame s. This is intended for metrics calculated over their axes. Multi-dimensional annotations are stored in obsm and varm, which are aligned to the objects observation and variable dimensions respectively. Additional measurements across both observations and variables are stored in layers.

Indexing into an AnnData object can be performed by relative position with numeric indices (like pandas’ iloc), or by labels (like loc). To avoid ambiguity, indexes of the AnnData object are converted to strings by the constructor.

Subsetting an AnnData object by indexing into it will also subset it’s elements according to the dimensions they were aligned to. This means an operation like adata[list_of_obs, :] will also subset (albeit lazily) obs, obsm, and layers.

If the unstructured annotations uns contain a sparse matrix of shape n_obs × n_obs, these are subset with the observation dimension.

Subsetting an AnnData object returns a view into the original object, meaning very little additional memory is used upon subsetting. This is achieved through laziness, meaning subsetting the constituent arrays is deferred until they are accessed. Copying a view causes an equivalent “real” AnnData object to be generated. Attempting to modify a view (at any attribute except X) is handled in a copy-on-modify manner, meaning the object is initialized in place. Here’s an example:

batch1 = adata[adata.obs["batch"] == "batch1", :]
batch1.obs["value"] = 0  # This makes batch1 a "real" anndata object, with it's own data

At the end of this snippet: adata was not modified, and batch1 is it’s own AnnData object with it’s own data.

Similar to Bioconductor’s ExpressionSet, subsetting an AnnData object doesn’t reduce the dimensions of it’s constituent arrays. This differs from behaviour of libraries like pandas, numpy, and xarray. However, unlike the classes exposed by those libraries, there is no concept of a one dimensional AnnData object. They have two inherent dimensions, obs and var. Additionally, maintaining the dimensionality of the AnnData object allows for consistent handling of scipy.sparse sparse matrices and numpy arrays.

Attributes

T

Transpose whole object.

X

Data matrix of shape n_obs × n_vars.

filename

Change to backing mode by setting the filename of a .h5ad file.

isbacked

True if object is backed on disk, False otherwise.

isview

True if object is view of another AnnData object, False otherwise.

layers

Dictionary-like object with values of the same dimensions as X.

n_obs

Number of observations.

n_vars

Number of variables/features.

obs

One-dimensional annotation of observations (pd.DataFrame).

obs_names

Names of observations (alias for .obs.index).

obsm

Multi-dimensional annotation of observations (mutable structured ndarray).

obsp

Pairwise annotation of observations, a mutable mapping with array-like values.

raw

Store raw version of X and var as .raw.X and .raw.var.

shape

Shape of data matrix (n_obs, n_vars).

uns

Unstructured annotation (ordered dictionary).

var

One-dimensional annotation of variables/ features (pd.DataFrame).

var_names

Names of variables (alias for .var.index).

varm

Multi-dimensional annotation of variables/ features (mutable structured ndarray).

varp

Pairwise annotation of observations, a mutable mapping with array-like values.

Methods

chunk_X([select, replace])

Return a chunk of the data matrix X with random or specified indices.

chunked_X([chunk_size])

Return an iterator over the rows of the data matrix X.

concatenate(*adatas[, join, batch_key, …])

Concatenate along the observations axis.

copy([filename])

Full copy, optionally on disk.

obs_keys()

List keys of observation annotation obs.

obs_names_make_unique([join])

Makes the index unique by appending ‘1’, ‘2’, etc.

obs_vector(k, *[, layer])

Convenience function for returning a 1 dimensional ndarray of values from .X, .layers[k], or .obs.

obsm_keys()

List keys of observation annotation obsm.

rename_categories(key, categories)

Rename categories of annotation key in obs, var, and uns.

strings_to_categoricals([df])

Transform string annotations to categoricals.

to_df()

Generate shallow DataFrame.

transpose()

Transpose whole object.

uns_keys()

List keys of unstructured annotation.

var_keys()

List keys of variable annotation var.

var_names_make_unique([join])

Makes the index unique by appending ‘1’, ‘2’, etc.

var_vector(k, *[, layer])

Convenience function for returning a 1 dimensional ndarray of values from .X, .layers[k], or .obs.

varm_keys()

List keys of variable annotation varm.

write([filename, compression, …])

Write .h5ad-formatted hdf5 file.

write_csvs(dirname[, skip_data, sep])

Write annotation to .csv files.

write_h5ad([filename, compression, …])

Write .h5ad-formatted hdf5 file.

write_loom(filename[, write_obsm_varm])

Write .loom-formatted hdf5 file.

write_zarr(store[, chunks])

Write a hierarchical Zarr array store.