anndata.AnnData#
- class anndata.AnnData(X=None, obs=None, var=None, uns=None, *, obsm=None, varm=None, layers=None, raw=None, dtype=None, shape=None, filename=None, filemode=None, asview=False, obsp=None, varp=None, oidx=None, vidx=None)[source]#
- An annotated data matrix. - AnnDatastores a data matrix- Xtogether with annotations of observations- obs(- obsm,- obsp), variables- var(- varm,- varp), and unstructured annotations- uns.- An - AnnDataobject- adatacan be sliced like a- DataFrame, for instance- adata_subset = adata[:, list_of_variable_names].- AnnData’s basic structure is similar to R’s ExpressionSet [Huber15]. If setting an- .h5ad-formatted HDF5 backing file- .filename, data remains on the disk but is automatically loaded into memory if needed.- Parameters:
- X ndarray|MaskedArray|csr_matrix|csc_matrix|csr_array|csc_array|Dataset|Array|ZappyArray|CSRDataset|CSCDataset|Array|ndarray|spmatrix|DataFrame|None(default:None)
- A #observations × #variables data matrix. A view of the data is used if the data type matches, otherwise, a copy is made. 
- obs DataFrame|Mapping[str,Iterable[Any]] |None(default:None)
- Key-indexed one-dimensional observations annotation of length #observations. 
- var DataFrame|Mapping[str,Iterable[Any]] |None(default:None)
- Key-indexed one-dimensional variables annotation of length #variables. 
- uns Mapping[str,Any] |None(default:None)
- Key-indexed unstructured annotation. 
- obsm ndarray|Mapping[str,Sequence[Any]] |None(default:None)
- Key-indexed multi-dimensional observations annotation of length #observations. If passing a - ndarray, it needs to have a structured datatype.
- varm ndarray|Mapping[str,Sequence[Any]] |None(default:None)
- Key-indexed multi-dimensional variables annotation of length #variables. If passing a - ndarray, it needs to have a structured datatype.
- layers Mapping[str,ndarray|MaskedArray|csr_matrix|csc_matrix|csr_array|csc_array|Dataset|Array|ZappyArray|CSRDataset|CSCDataset|Array|ndarray|spmatrix] |None(default:None)
- Key-indexed multi-dimensional arrays aligned to dimensions of - X.
- shape tuple[int,int] |None(default:None)
- Shape tuple (#observations, #variables). Can only be provided if - Xis- None.
- filename PathLike[str] |str|None(default:None)
- Name of backing file. See - h5py.File.
- filemode Optional[Literal['r','r+']] (default:None)
- Open mode of backing file. See - h5py.File.
 
- X 
 - See also - io.read_h5ad,- io.read_csv,- io.read_excel,- io.read_hdf,- io.read_loom,- io.read_zarr,- io.read_mtx,- io.read_text,- io.read_umi_tools- Notes - AnnDatastores observations (samples) of variables/features in the rows of a matrix. This is the convention of the modern classics of statistics [Hastie09] and machine learning [Murphy12], the convention of dataframes both in R and Python and the established statistics and machine learning packages in Python (statsmodels, scikit-learn).- Single dimensional annotations of the observation and variables are stored in the - obsand- varattributes as- DataFrames. This is intended for metrics calculated over their axes. Multi-dimensional annotations are stored in- obsmand- varm, which are aligned to the objects observation and variable dimensions respectively. Square matrices representing graphs are stored in- obspand- varp, with both of their own dimensions aligned to their associated axis. Additional measurements across both observations and variables are stored in- layers.- Indexing into an AnnData object can be performed by relative position with numeric indices (like pandas’ - iloc()), or by labels (like- loc()). To avoid ambiguity with numeric indexing into observations or variables, indexes of the AnnData object are converted to strings by the constructor.- Subsetting an AnnData object by indexing into it will also subset its elements according to the dimensions they were aligned to. This means an operation like - adata[list_of_obs, :]will also subset- obs,- obsm, and- layers.- Subsetting an AnnData object returns a view into the original object, meaning very little additional memory is used upon subsetting. This is achieved lazily, meaning that the constituent arrays are subset on access. Copying a view causes an equivalent “real” AnnData object to be generated. Attempting to modify a view (at any attribute except X) is handled in a copy-on-modify manner, meaning the object is initialized in place. Here’s an example: - batch1 = adata[adata.obs["batch"] == "batch1", :] batch1.obs["value"] = 0 # This makes batch1 a “real” AnnData object - At the end of this snippet: - adatawas not modified, and- batch1is its own AnnData object with its own data.- Similar to Bioconductor’s - ExpressionSetand- scipy.sparsematrices, subsetting an AnnData object retains the dimensionality of its constituent arrays. Therefore, unlike with the classes exposed by- pandas,- numpy, and- xarray, there is no concept of a one dimensional AnnData object. AnnDatas always have two inherent dimensions,- obsand- var. Additionally, maintaining the dimensionality of the AnnData object allows for consistent handling of- scipy.sparsematrices and- numpyarrays.- Attributes - Transpose whole object. - Change to backing mode by setting the filename of a - .h5adfile.- Trueif object is view of another AnnData object,- Falseotherwise.- Trueif object is backed on disk,- Falseotherwise.- Dictionary-like object with values of the same dimensions as - X.- Number of observations. - Number of variables/features. - One-dimensional annotation of observations ( - pd.DataFrame).- Names of observations (alias for - .obs.index).- Multi-dimensional annotation of observations (mutable structured - ndarray).- Pairwise annotation of observations, a mutable mapping with array-like values. - Unstructured annotation (ordered dictionary). - One-dimensional annotation of variables/ features ( - pd.DataFrame).- Names of variables (alias for - .var.index).- Multi-dimensional annotation of variables/features (mutable structured - ndarray).- Pairwise annotation of variables/features, a mutable mapping with array-like values. - Methods - chunk_X([select, replace])- Return a chunk of the data matrix - Xwith random or specified indices.- chunked_X([chunk_size])- Return an iterator over the rows of the data matrix - X.- concatenate(*adatas[, join, batch_key, ...])- Concatenate along the observations axis. - copy([filename])- Full copy, optionally on disk. - obs_names_make_unique([join])- Makes the index unique by appending a number string to each duplicate index element: '1', '2', etc. - obs_vector(k, *[, layer])- Convenience function for returning a 1 dimensional ndarray of values from - X,- layers- [k], or- obs.- rename_categories(key, categories)- strings_to_categoricals([df])- Transform string annotations to categoricals. - to_df([layer])- Generate shallow - DataFrame.- to_memory(*[, copy])- Return a new AnnData object with all backed arrays loaded into memory. - Transpose whole object. - var_names_make_unique([join])- Makes the index unique by appending a number string to each duplicate index element: '1', '2', etc. - var_vector(k, *[, layer])- Convenience function for returning a 1 dimensional ndarray of values from - X,- layers- [k], or- obs.- write([filename, ...])- Write - .h5ad-formatted hdf5 file.- write_csvs(dirname, *[, skip_data, sep])- Write annotation to - .csvfiles.- write_h5ad([filename, ...])- Write - .h5ad-formatted hdf5 file.- write_loom(filename, *[, write_obsm_varm])- Write - .loom-formatted hdf5 file.- write_zarr(store, *[, chunks, ...])- Write a hierarchical Zarr array store.