anndata.concat

anndata.concat#

anndata.concat(adatas, *, axis='obs', join='inner', merge=None, uns_merge=None, label=None, keys=None, index_unique=None, fill_value=None, pairwise=False, force_lazy=False)[source]#

Concatenates AnnData objects along an axis.

See the concatenation section in the docs for a more in-depth description.

Parameters:

adatas Collection[AnnData] | Mapping[str, AnnData]

The objects to be concatenated. If a Mapping is passed, keys are used for the keys argument and values are concatenated.

axis Literal['obs', 0, 'var', 1] (default: 'obs')

Which axis to concatenate along.

join Literal['inner', 'outer'] | Default (default: 'inner')

How to align values when concatenating. If “outer”, the union of the other axis is taken. If “inner”, the intersection. See concatenation for more.

merge Literal['same', 'unique', 'first', 'only'] | Callable | None (default: None)

How elements not aligned to the axis being concatenated along are selected. Currently implemented strategies include:

None: No elements are kept.
"same": Elements that are the same in each of the objects.
"unique": Elements for which there is only one possible value.
"first": The first element seen at each from each position.
"only": Elements that show up in only one of the objects.

For xarray.Dataset objects, we use their xarray.merge() with override to stay lazy.

uns_merge Literal['same', 'unique', 'first', 'only'] | Callable | None (default: None)

How the elements of .uns are selected. Uses the same set of strategies as the merge argument, except applied recursively.

label str | None (default: None)

Column in axis annotation (i.e. .obs or .var) to place batch information in. If it’s None, no column is added.

keys Collection | None (default: None)

Names for each object being added. These values are used for column values for label or appended to the index if index_unique is not None. Defaults to incrementing integer labels.

index_unique str | None (default: None)

Whether to make the index unique by using the keys. If provided, this is the delimiter between “{orig_idx}{index_unique}{key}”. When None, the original indices are kept.

fill_value Any | None (default: None)

When join="outer", this is the value that will be used to fill the introduced indices. By default, sparse arrays are padded with zeros, while dense arrays and DataFrames are padded with missing values.

pairwise bool (default: False)

Whether pairwise elements along the concatenated dimension should be included. This is False by default, since the resulting arrays are often not meaningful.

force_lazy bool (default: False)

Whether to lazily concatenate elements using dask even when eager concatenation is possible. At the moment, this only affects obs/var and elements of obsm/varm that are xarray Datasets.

Return type:

AnnData

Notes

Warning

If you use join='outer' this fills 0s for sparse data when variables are absent in a batch. Use this with care. Dense data is filled with NaN.

Examples

Preparing example objects

>>> import anndata as ad, pandas as pd, numpy as np
>>> from scipy import sparse
>>> a = ad.AnnData(
...     X=sparse.csr_matrix(np.array([[0, 1], [2, 3]])),
...     obs=pd.DataFrame({"group": ["a", "b"]}, index=["s1", "s2"]),
...     var=pd.DataFrame(index=["var1", "var2"]),
...     varm={
...         "ones": np.ones((2, 5)),
...         "rand": np.random.randn(2, 3),
...         "zeros": np.zeros((2, 5)),
...     },
...     uns={"a": 1, "b": 2, "c": {"c.a": 3, "c.b": 4}},
... )
>>> b = ad.AnnData(
...     X=sparse.csr_matrix(np.array([[4, 5, 6], [7, 8, 9]])),
...     obs=pd.DataFrame(
...         {"group": ["b", "c"], "measure": [1.2, 4.3]}, index=["s3", "s4"]
...     ),
...     var=pd.DataFrame(index=["var1", "var2", "var3"]),
...     varm={"ones": np.ones((3, 5)), "rand": np.random.randn(3, 5)},
...     uns={"a": 1, "b": 3, "c": {"c.b": 4}},
... )
>>> c = ad.AnnData(
...     X=sparse.csr_matrix(np.array([[10, 11], [12, 13]])),
...     obs=pd.DataFrame({"group": ["a", "b"]}, index=["s1", "s2"]),
...     var=pd.DataFrame(index=["var3", "var4"]),
...     uns={"a": 1, "b": 4, "c": {"c.a": 3, "c.b": 4, "c.c": 5}},
... )

Concatenating along different axes

>>> ad.concat([a, b]).to_df()
    var1  var2
s1     0     1
s2     2     3
s3     4     5
s4     7     8
>>> ad.concat([a, c], axis="var").to_df()
    var1  var2  var3  var4
s1     0     1    10    11
s2     2     3    12    13

Inner and outer joins

>>> inner = ad.concat([a, b])  # Joining on intersection of variables
>>> inner
AnnData object with n_obs × n_vars = 4 × 2
    obs: 'group'
    layers: None (.X)
>>> (
...     inner.obs_names.astype("string"),
...     inner.var_names.astype("string"),
... )
(Index(['s1', 's2', 's3', 's4'], dtype='string'),
Index(['var1', 'var2'], dtype='string'))
>>> outer = ad.concat([a, b], join="outer")  # Joining on union of variables
>>> outer
AnnData object with n_obs × n_vars = 4 × 3
    obs: 'group', 'measure'
    layers: None (.X)
>>> outer.var_names.astype("string")
Index(['var1', 'var2', 'var3'], dtype='string')
>>> outer.to_df()  # Sparse arrays are padded with zeroes by default
    var1  var2  var3
s1     0     1     0
s2     2     3     0
s3     4     5     6
s4     7     8     9

Using the axis’ index instead of its name

>>> ad.concat([a, b], axis=0).to_df()  # Equivalent to axis="obs"
    var1  var2
s1     0     1
s2     2     3
s3     4     5
s4     7     8
>>> ad.concat([a, c], axis=1).to_df()  # Equivalent to axis="var"
    var1  var2  var3  var4
s1     0     1    10    11
s2     2     3    12    13

Keeping track of source objects

>>> ad.concat({"a": a, "b": b}, label="batch").obs
   group batch
s1     a     a
s2     b     a
s3     b     b
s4     c     b
>>> ad.concat([a, b], label="batch", keys=["a", "b"]).obs  # Equivalent to previous
   group batch
s1     a     a
s2     b     a
s3     b     b
s4     c     b
>>> ad.concat({"a": a, "b": b}, index_unique="-").obs
     group
s1-a     a
s2-a     b
s3-b     b
s4-b     c

Combining values not aligned to axis of concatenation

>>> ad.concat([a, b], merge="same")
AnnData object with n_obs × n_vars = 4 × 2
    obs: 'group'
    varm: 'ones'
    layers: None (.X)
>>> ad.concat([a, b], merge="unique")
AnnData object with n_obs × n_vars = 4 × 2
    obs: 'group'
    varm: 'ones', 'zeros'
    layers: None (.X)
>>> ad.concat([a, b], merge="first")
AnnData object with n_obs × n_vars = 4 × 2
    obs: 'group'
    varm: 'ones', 'rand', 'zeros'
    layers: None (.X)
>>> ad.concat([a, b], merge="only")
AnnData object with n_obs × n_vars = 4 × 2
    obs: 'group'
    varm: 'zeros'
    layers: None (.X)

The same merge strategies can be used for elements in .uns

>>> dict(ad.concat([a, b, c], uns_merge="same").uns)
{'a': 1, 'c': {'c.b': 4}}
>>> dict(ad.concat([a, b, c], uns_merge="unique").uns)
{'a': 1, 'c': {'c.a': 3, 'c.b': 4, 'c.c': 5}}
>>> dict(ad.concat([a, b, c], uns_merge="only").uns)
{'c': {'c.c': 5}}
>>> dict(ad.concat([a, b, c], uns_merge="first").uns)
{'a': 1, 'b': 2, 'c': {'c.a': 3, 'c.b': 4, 'c.c': 5}}

anndata.concat

Contents

anndata.concat#