anndata.experimental.backed.Dataset2D#

class anndata.experimental.backed.Dataset2D(ds)[source]#

Bases: Mapping[Hashable, DataArray | Self]

A wrapper class meant to enable working with lazy dataframe data according to AnnData’s internal API. This class ensures that “dataframe-invariants” are respected, namely that there is only one 1d dim and coord with the same name i.e., like a pandas.DataFrame.

You should not have to initiate this class yourself. Setting an xarray.Dataset into a relevant part of the AnnData object will attempt to wrap that object in this object, trying to enforce the “dataframe-invariants.”

Because xarray requires xarray.Dataset.coords to be in-memory, this class provides handling for an out-of-memory index via true_index. This feature is helpful for loading remote data faster where the index itself may not be initially useful for constructing the object e.g., cell ids.

Attributes

columns[source]#

AnnData internally looks for columns so this ensures usability

Returns:

pandas.Index that represents the “columns.”

ds[source]#

The underlying xarray.Dataset.

dtypes[source]#

Return a Mapping with the dtypes of the variables in the Dataset2D.

iloc[source]#

AnnData internally looks for iloc so this ensures usability.

Returns:

Handler class for doing the iloc-style indexing using isel().

index[source]#

A pandas.Index object corresponding to anndata.experimental.backed.Dataset2D.index_dim.

AnnData internally looks for index so this ensures usability.

Returns:

The index of the of the dataframe as resolved from coords.

index_dim[source]#

The underlying computational index i.e., the lone coordinate dimension.

is_backed[source]#

Check whether or not the object is backed, used to indicate if there are any in-memory objects. Must be externally set, defaults false.

shape[source]#

AnnData internally looks for shape so this ensures usability.

Returns:

The (2D) shape of the dataframe resolved from sizes.

true_index[source]#

true_xr_index as a pandas.Index.

true_index_dim[source]#

Key of the “true” index.

Because xarray loads its coordinates/indexes in memory, we allow for signaling that a given variable, which is not a coordinate, is the “true” index.

For example, the true index may be cell names but loading these over an internet connection may not be desirable or necessary for most use cases such as getting a quick preview of the columns or loading only one column that isn’t the index.

This property is the key of said variable. The default is index_dim if this variable has not been set.

true_xr_index[source]#

The index AnnData is actually interested in e.g., cell names, for verification.

xr_index[source]#

The coordinate of anndata.experimental.backed.Dataset2D.index_dim

Methods

copy(data=None, *, deep=False)[source]#

Return a copy of the Dataset2D object. See xarray.Dataset.copy() for more information.

Return type:

Dataset2D

equals(b)[source]#

Thin wrapper around xarray.Dataset.equals()

Return type:

bool

keys()[source]#
Return type:

list[Hashable]

reindex(index=None, axis=0, fill_value=nan)[source]#

Reindex the current object against a new index.

Parameters:
index Index | None (default: None)

The new index for reindexing, by default None

axis Literal[0] (default: 0)

Provided for API consistency, should not be called over axis!=0, by default 0

fill_value Any | None (default: nan)

The value with which to fill in via pandas.Series.reindex(), by default np.nan

Return type:

Dataset2D

Returns:

Reindexed dataset.

to_memory(*, copy=False)[source]#

Converts to pandas.DataFrame. The index of the dataframe comes from true_index_dim if it differs from index_dim.

Parameters:
copy bool (default: False)

Unused argument

Return type:

DataFrame

Returns:

pandas.DataFrame with index set accordingly.