Accessors and paths

Accessors and paths#

anndata.acc provides accessors that create references to axis-aligned 1D and 2D arrays in AnnData objects. You can use these to drive e.g. plotting or validation code. For these purposes, they are

  1. easy to create:

    The central A object is an accessor for the whole AnnData object, and allows you to create AdRef objects, which are references to arrays spanning one or two dimensions of an AnnData object (without being bound to a specific object):

    >>> from anndata.acc import A
    >>> A.X[:, "gene-3"]  # reference to `adata[:, "gene-3"].X` as 1D vector
    A.X[:, 'gene-3']
    >>> type(A.X[:, "gene-3"])
    <class 'anndata.acc.AdRef'>
    

    … and to use:

    >>> import scanpy as sc
    >>> adata = sc.datasets.pbmc3k_processed()
    

    E.g. to check if adata.varm["PCs"] has at least 30 columns:

    >>> A.varm["PCs"][:, 30] in adata
    True
    

    or to extract the referenced vector:

    >>> ref = A.obs["louvain"]
    >>> adata[ref].categories[:2]
    Index(['CD4 T cells', 'CD14+ Monocytes'], dtype=...)
    
  2. introspectible:

    AdRefs have the AdRef.dims, AdRef.idx, and AdRef.acc attributes, allowing you to inspect all relevant properties.

    >>> pc0 = A.obsm["pca"][:, 0]
    >>> pc0
    A.obsm['pca'][:, 0]
    >>> pc0.idx
    0
    >>> pc0.acc
    A.obsm['pca']
    >>> A.var["symbol"].dims
    {'var'}
    >>> pc0.acc.k
    'pca'
    
  3. convenient:

    Want to reference multiple vectors from the same object? Pass a list of indices to the vector accessor:

    >>> A.obsp["connectivities"][:, ["cell0", "cell1"]]
    [A.obsp['connectivities'][:, 'cell0'], A.obsp['connectivities'][:, 'cell1']]
    
  4. extensible: see extending accessors.

API & Glossary#

The central accessor is A:

anndata.acc.A: AdAcc[AdRef] = A[source]#

A global accessor to create AdRefs.

See AdAcc for examples of how to use it to create references (i.e., AdRefs).

AdAcc([ref_class, layer_cls, meta_cls, ...])

Accessor to create AdRefs (A).

AdRef(acc, idx)

A reference to a 1D or 2D array along one or two dimensions of an AnnData object.

reference#

An instance of AdRef. References a 1D or 2D array in AnnData objects. It is independent of individual objects and can be inspected, checked for equality, used as mapping keys, or applied to concrete objects, e.g. via ref in adata or adata[ref]. An example of this would be A.obsm["d"][:, 2] but not A.obsm["d"], which is a reference accessor.

accessor#

An instance of any of the *Acc classes, i.e. AdAcc, or subclasses of MapAcc or RefAcc. Can be descended into via attribute access to get deeper accessors (e.g. AA.obs) or references (e.g. A.obs.index, A.obs["c"]). Their presence in an anndata object can also be checked via acc in adata.

reference accessor#

RefAcc subclasses directly create references (AdRef instances). They can be accessed from these references using the AdRef.acc attribute, and are therefore useful in matches or isinstance() checks:

RefAcc(*, ref_class)

Abstract base class for reference accessors.

Class

has attributes

available as A.???

Example reference creation

LayerAcc

k

X, layers[key]

A.X[:, :], A.layers["c"][:, "g0"]

MetaAcc

dim

obs, var

A.obs["a"], A.var["b"]

MultiAcc

dim, k

obsm[key], varm[key]

A.obsm["d"][:, 2]

GraphAcc

dim, k

obsp[key], varp[key]

A.obsp["e"][:, "c1"], A.vbsp["e"]["g0", :]

mapping accessor#

MapAcc subclasses can be indexed with a string to create reference accessors, e.g. A.layers or A.obsm are both MapAccs, while A.layers["a"] is a LayerAcc and A.obsm["b"] is a MultiAcc. MapAccs are mostly useful for extending, but might be useful for APIs that need to refer to a Mapping of arrays:

MapAcc()

Accessor for mapping containers.

LayerMapAcc(*, ref_class[, ref_acc_cls])

Accessor for layers (A.layers).

MultiMapAcc(dim, *, ref_class[, ref_acc_cls])

Accessor for multi-dimensional array containers (A.obsm/A.varm).

GraphMapAcc(dim, *, ref_class[, ref_acc_cls])

Accessor for graph containers (A.obsp/A.varp).

Extending accessors#

There are three layers of extensibility:

  1. subclassing RefAcc and creating a new AdRef instance for creating them:

    from matplotlib import pyplot as plt
    from anndata.acc import AdAcc, AdRef
    
    class MplRef(AdRef, str):
        """Matplotlib will only treat strings as references, so we subclass `str`."""
        def __new__(cls, acc, idx) -> None:
            obj = str.__new__(cls, str(AdRef(acc, idx)))
            AdRef.__init__(obj, acc, idx)
            return obj
    
    A = AdAcc(ref_class=MplRef)
    
    adata = sc.datasets.pbmc3k_processed()
    plt.scatter(*A.obsm["X_umap"][:, [0, 1]], c=A.obs["n_counts"], data=adata)
    
  2. subclass one or more of the reference accessors, and create a new AdAcc instance:

    >>> from anndata.acc import AdAcc, AdRef, MetaAcc
    >>>
    >>> class TwoDRef(AdRef):
    ...     """A reference able to refer to multiple metadata columns."""
    ...     ...
    >>>
    >>> class MyMetaAcc(MetaAcc):
    ...     def __getitem__(self, k):
    ...         if isinstance(k, list):
    ...             # override default behavior of returning a list of refs
    ...             return self.ref_class(self, k)
    ...         return super().__getitem__(k)
    >>>
    >>> A = AdAcc(ref_class=TwoDRef, meta_cls=MyMetaAcc)
    >>> A.obs[["a", "b"]]
    A.obs[['a', 'b']]
    
  3. subclass AdAcc to add new accessors:

    >>> from dataclasses import dataclass, field
    >>> from anndata.acc import AdAcc, MetaAcc
    >>>
    >>> @dataclass(frozen=True)
    ... class EHRAcc(AdAcc):
    ...     tem: MetaAcc = field(init=False)
    ...     def __post_init__(self) -> None:
    ...         super().__post_init__()
    ...         tem = MetaAcc("tem", ref_class=self.ref_class)
    ...         object.__setattr__(self, "tem", tem)  # necessary because it’s frozen
    >>>
    >>> A = EHRAcc()
    >>> A.tem["visit_id"]
    A.tem['visit_id']