anndata.AnnData.concatenate#
- AnnData.concatenate(*adatas, join='inner', batch_key='batch', batch_categories=None, uns_merge=None, index_unique='-', fill_value=None)[source]#
- Concatenate along the observations axis. - The - uns,- varmand- obsmattributes are ignored.- Currently, this works only in - 'memory'mode.- Note - For more flexible and efficient concatenation, see: - concat().- Parameters:
- adatas AnnData
- AnnData matrices to concatenate with. Each matrix is referred to as a “batch”. 
- join str(default:'inner')
- Use intersection ( - 'inner') or union (- 'outer') of variables.
- batch_key str(default:'batch')
- Add the batch annotation to - obsusing this key.
- batch_categories Sequence[Any] (default:None)
- Use these as categories for the batch annotation. By default, use increasing numbers. 
- uns_merge str|None(default:None)
- Strategy to use for merging entries of uns. These strategies are applied recusivley. Currently implemented strategies include: - None: The default. The concatenated object will just have an empty dict for- uns.
- "same": Only entries which have the same value in all AnnData objects are kept.
- "unique": Only entries which have one unique value in all AnnData objects are kept.
- "first": The first non-missing value is used.
- "only": A value is included if only one of the AnnData objects has a value at this path.
 
- index_unique str|None(default:'-')
- Make the index unique by joining the existing index names with the batch category, using - index_unique='-', for instance. Provide- Noneto keep existing indices.
- fill_value default: None
- Scalar value to fill newly missing values in arrays with. Note: only applies to arrays and sparse matrices (not dataframes) and will only be used if - join="outer".- Note - If not provided, the default value is - 0for sparse matrices and- np.nanfor numpy arrays. See the examples below for more information.
 
- adatas 
- Return type:
- Returns:
- AnnDataThe concatenated- AnnData, where- adata.obs[batch_key]stores a categorical variable labeling the batch.
 - Notes - Warning - If you use - join='outer'this fills 0s for sparse data when variables are absent in a batch. Use this with care. Dense data is filled with- NaN. See the examples.- Examples - Joining on intersection of variables. - >>> adata1 = AnnData( ... np.array([[1, 2, 3], [4, 5, 6]]), ... dict(obs_names=['s1', 's2'], anno1=['c1', 'c2']), ... dict(var_names=['a', 'b', 'c'], annoA=[0, 1, 2]), ... ) >>> adata2 = AnnData( ... np.array([[1, 2, 3], [4, 5, 6]]), ... dict(obs_names=['s3', 's4'], anno1=['c3', 'c4']), ... dict(var_names=['d', 'c', 'b'], annoA=[0, 1, 2]), ... ) >>> adata3 = AnnData( ... np.array([[1, 2, 3], [4, 5, 6]]), ... dict(obs_names=['s1', 's2'], anno2=['d3', 'd4']), ... dict(var_names=['d', 'c', 'b'], annoA=[0, 2, 3], annoB=[0, 1, 2]), ... ) >>> adata = adata1.concatenate(adata2, adata3) >>> adata AnnData object with n_obs × n_vars = 6 × 2 obs: 'anno1', 'anno2', 'batch' var: 'annoA-0', 'annoA-1', 'annoA-2', 'annoB-2' >>> adata.X array([[2, 3], [5, 6], [3, 2], [6, 5], [3, 2], [6, 5]]) >>> adata.obs anno1 anno2 batch s1-0 c1 NaN 0 s2-0 c2 NaN 0 s3-1 c3 NaN 1 s4-1 c4 NaN 1 s1-2 NaN d3 2 s2-2 NaN d4 2 >>> adata.var.T b c annoA-0 1 2 annoA-1 2 1 annoA-2 3 2 annoB-2 2 1 - Joining on the union of variables. - >>> outer = adata1.concatenate(adata2, adata3, join='outer') >>> outer AnnData object with n_obs × n_vars = 6 × 4 obs: 'anno1', 'anno2', 'batch' var: 'annoA-0', 'annoA-1', 'annoA-2', 'annoB-2' >>> outer.var.T a b c d annoA-0 0.0 1.0 2.0 NaN annoA-1 NaN 2.0 1.0 0.0 annoA-2 NaN 3.0 2.0 0.0 annoB-2 NaN 2.0 1.0 0.0 >>> outer.var_names Index(['a', 'b', 'c', 'd'], dtype='object') >>> outer.X array([[ 1., 2., 3., nan], [ 4., 5., 6., nan], [nan, 3., 2., 1.], [nan, 6., 5., 4.], [nan, 3., 2., 1.], [nan, 6., 5., 4.]]) >>> outer.X.sum(axis=0) array([nan, 25., 23., nan]) >>> import pandas as pd >>> Xdf = pd.DataFrame(outer.X, columns=outer.var_names) >>> Xdf a b c d 0 1.0 2.0 3.0 NaN 1 4.0 5.0 6.0 NaN 2 NaN 3.0 2.0 1.0 3 NaN 6.0 5.0 4.0 4 NaN 3.0 2.0 1.0 5 NaN 6.0 5.0 4.0 >>> Xdf.sum() a 5.0 b 25.0 c 23.0 d 10.0 dtype: float64 - One way to deal with missing values is to use masked arrays: - >>> from numpy import ma >>> outer.X = ma.masked_invalid(outer.X) >>> outer.X masked_array( data=[[1.0, 2.0, 3.0, --], [4.0, 5.0, 6.0, --], [--, 3.0, 2.0, 1.0], [--, 6.0, 5.0, 4.0], [--, 3.0, 2.0, 1.0], [--, 6.0, 5.0, 4.0]], mask=[[False, False, False, True], [False, False, False, True], [ True, False, False, False], [ True, False, False, False], [ True, False, False, False], [ True, False, False, False]], fill_value=1e+20) >>> outer.X.sum(axis=0).data array([ 5., 25., 23., 10.]) - The masked array is not saved but has to be reinstantiated after saving. - >>> outer.write('./test.h5ad') >>> from anndata import read_h5ad >>> outer = read_h5ad('./test.h5ad') >>> outer.X array([[ 1., 2., 3., nan], [ 4., 5., 6., nan], [nan, 3., 2., 1.], [nan, 6., 5., 4.], [nan, 3., 2., 1.], [nan, 6., 5., 4.]]) - For sparse data, everything behaves similarly, except that for - join='outer', zeros are added.- >>> from scipy.sparse import csr_matrix >>> adata1 = AnnData( ... csr_matrix([[0, 2, 3], [0, 5, 6]], dtype=np.float32), ... dict(obs_names=['s1', 's2'], anno1=['c1', 'c2']), ... dict(var_names=['a', 'b', 'c']), ... ) >>> adata2 = AnnData( ... csr_matrix([[0, 2, 3], [0, 5, 6]], dtype=np.float32), ... dict(obs_names=['s3', 's4'], anno1=['c3', 'c4']), ... dict(var_names=['d', 'c', 'b']), ... ) >>> adata3 = AnnData( ... csr_matrix([[1, 2, 0], [0, 5, 6]], dtype=np.float32), ... dict(obs_names=['s5', 's6'], anno2=['d3', 'd4']), ... dict(var_names=['d', 'c', 'b']), ... ) >>> adata = adata1.concatenate(adata2, adata3, join='outer') >>> adata.var_names Index(['a', 'b', 'c', 'd'], dtype='object') >>> adata.X.toarray() array([[0., 2., 3., 0.], [0., 5., 6., 0.], [0., 3., 2., 0.], [0., 6., 5., 0.], [0., 0., 2., 1.], [0., 6., 5., 0.]], dtype=float32)