anndata.AnnData

class anndata.AnnData(X=None, obs=None, var=None, uns=None, obsm=None, varm=None, raw=None, dtype='float32', shape=None, filename=None, filemode=None, asview=False, *, oidx=None, vidx=None)

An annotated data matrix.

AnnData stores a data matrix .X together with annotations

of observations .obs, variables .var and unstructured annotations .uns.

https://falexwolf.de/img/scanpy/anndata.svg

An AnnData object adata can be sliced like a pandas dataframe, for instance, adata_subset = adata[:, list_of_variable_names]. AnnData’s basic structure is similar to R’s ExpressionSet [Huber15]. If setting an .h5ad-formatted HDF5 backing file .filename, data remains on the disk but is automatically loaded into memory if needed. See this blog post for more details.

Parameters:
X : Union[ndarray, spmatrix, None]

A #observations × #variables data matrix. A view of the data is used if the data type matches, otherwise, a copy is made.

obs : Union[DataFrame, Mapping[Any, Iterable[Any]], ndarray, None]

Key-indexed one-dimensional observation annotation of length #observations.

var : Union[DataFrame, Mapping[Any, Iterable[Any]], ndarray, None]

Key-indexed one-dimensional variable annotation of length #variables.

uns : Optional[Mapping[Any, Any]]

Unstructured annotation for the whole dataset.

obsm : Union[ndarray, Mapping[str, Sequence[Any]], None]

Key-indexed multi-dimensional observation annotation of length #observations.

varm : Union[ndarray, Mapping[str, Sequence[Any]], None]

Key-indexed multi-dimensional observation annotation of length #observations.

dtype : Union[dtype, str]

Data type used for storage.

shape : Optional[tuple]

Shape tuple (#observations, #variables). Can only be provided if X is None.

filename : Union[Path, str, None]

Name of backing file. See anndata.h5py.File.

filemode : Optional[str]

Open mode of backing file. See anndata.h5py.File.

Notes

Multi-dimensional annotations are stored in .obsm and .varm.

If the unstructured annotations .uns contain a sparse matrix of shape .n_obs × .n_obs, these are also sliced.

AnnData stores observations (samples) of variables (features) in the rows of a matrix. This is the convention of the modern classics of statistics [Hastie09] and machine learning [Murphy12], the convention of dataframes both in R and Python and the established statistics and machine learning packages in Python (statsmodels, scikit-learn).

A data matrix is flattened if either #observations (n_obs) or #variables (n_vars) is 1, so that numpy’s slicing behavior is reproduced:

adata = AnnData(np.ones((2, 2)))
adata[:, 0].X == adata.X[:, 0]

Methods

__init__([X, obs, var, uns, obsm, varm, …]) Initialize self.
chunked_X([chunk_size])
concatenate(*adatas[, join, batch_key, …]) Concatenate along the observations axis.
copy([filename]) Full copy, optionally on disk.
obs_keys() List keys of observation annotation .obs.
obs_names_make_unique([join]) Makes the index unique by appending ‘1’, ‘2’, etc.
obsm_keys() List keys of observation annotation obsm.
rename_categories(key, categories) Rename categories of annotation key in .obs, .var and .uns.
transpose() Transpose whole object.
uns_keys() List keys of unstructured annotation.
var_keys() List keys of variable annotation var.
var_names_make_unique([join]) Makes the index unique by appending ‘1’, ‘2’, etc.
varm_keys() List keys of variable annotation varm.
write([filename, compression, compression_opts]) Write .h5ad-formatted hdf5 file and close a potential backing file.
write_csvs(dirname[, skip_data, sep]) Write annotation to .csv files.
write_loom(filename) Write .loom-formatted hdf5 file.

Attributes

T Transpose whole object.
X Data matrix of shape n_obs × n_vars (np.ndarray, sp.sparse.spmatrix) or None.
filename Change to backing mode by setting the filename of a .h5ad file.
isbacked True if object is backed on disk, False otherwise.
isview True if object is view of another AnnData object, False otherwise.
n_obs Number of observations.
n_vars Number of variables/features.
obs One-dimensional annotation of observations (pd.DataFrame).
obs_names Names of observations (alias for .obs.index).
obsm Multi-dimensional annotation of observations (mutable structured np.ndarray).
raw Store raw version of .X and .var as .raw.X and .raw.var.
shape Shape of data matrix – (n_obs, n_vars).
uns Unstructured annotation (ordered dictionary).
var One-dimensional annotation of variables/ features (pd.DataFrame).
var_names Names of variables (alias for .var.index).
varm Multi-dimensional annotation of variables/ features (mutable structured np.ndarray).