anndata.experimental.concat_on_disk#
- anndata.experimental.concat_on_disk(in_files, out_file, *, overwrite=False, max_loaded_elems=100000000, axis=0, join='inner', merge=None, uns_merge=None, label=None, keys=None, index_unique=None, fill_value=None, pairwise=False)[source]#
Concatenates multiple AnnData objects along a specified axis using their corresponding stores or paths, and writes the resulting AnnData object to a target location on disk.
Unlike the concat function, this method does not require loading the input AnnData objects into memory, making it a memory-efficient alternative for large datasets. The resulting object written to disk should be equivalent to the concatenation of the loaded AnnData objects using the concat function.
To adjust the maximum amount of data loaded in memory; for sparse arrays use the max_loaded_elems argument; for dense arrays see the Dask documentation, as the Dask concatenation function is used to concatenate dense arrays in this function
- Parameters:
- in_files
Collection
[str
|PathLike
] |MutableMapping
[str
,str
|PathLike
] The corresponding stores or paths of AnnData objects to be concatenated. If a Mapping is passed, keys are used for the keys argument and values are concatenated.
- out_file
str
|PathLike
The target path or store to write the result in.
- overwrite
bool
(default:False
) If False while a file already exists it will raise an error, otherwise it will overwrite.
- max_loaded_elems
int
(default:100000000
) The maximum number of elements to load in memory when concatenating sparse arrays. Note that this number also includes the empty entries. Set to 100m by default meaning roughly 400mb will be loaded to memory at simultaneously.
- axis {0, 1}
Literal
[0
,1
] (default:0
) Which axis to concatenate along.
- join {‘inner’, ‘outer’}
Literal
['inner'
,'outer'
] (default:'inner'
) How to align values when concatenating. If “outer”, the union of the other axis is taken. If “inner”, the intersection. See concatenation for more.
- merge {‘same’, ‘unique’, ‘first’, ‘only’} | (
Collection
[Mapping
]) →Mapping
|None
Union
[Literal
['same'
,'unique'
,'first'
,'only'
],Callable
[[Collection
[Mapping
]],Mapping
],None
] (default:None
) How elements not aligned to the axis being concatenated along are selected. Currently implemented strategies include:
None: No elements are kept.
”same”: Elements that are the same in each of the objects.
”unique”: Elements for which there is only one possible value.
”first”: The first element seen at each from each position.
”only”: Elements that show up in only one of the objects.
- uns_merge {‘same’, ‘unique’, ‘first’, ‘only’} | (
Collection
[Mapping
]) →Mapping
|None
Union
[Literal
['same'
,'unique'
,'first'
,'only'
],Callable
[[Collection
[Mapping
]],Mapping
],None
] (default:None
) How the elements of .uns are selected. Uses the same set of strategies as the merge argument, except applied recursively.
- label
Optional
[str
] (default:None
) Column in axis annotation (i.e. .obs or .var) to place batch information in. If it’s None, no column is added.
- keys
Optional
[Collection
[str
]] (default:None
) Names for each object being added. These values are used for column values for label or appended to the index if index_unique is not None. Defaults to incrementing integer labels.
- index_unique
Optional
[str
] (default:None
) Whether to make the index unique by using the keys. If provided, this is the delimiter between “{orig_idx}{index_unique}{key}”. When None, the original indices are kept.
- fill_value
Optional
[Any
] (default:None
) When join=”outer”, this is the value that will be used to fill the introduced indices. By default, sparse arrays are padded with zeros, while dense arrays and DataFrames are padded with missing values.
- pairwise
bool
(default:False
) Whether pairwise elements along the concatenated dimension should be included. This is False by default, since the resulting arrays are often not meaningful.
- in_files
- Return type:
Notes
Warning
If you use join=’outer’ this fills 0s for sparse data when variables are absent in a batch. Use this with care. Dense data is filled with NaN.