Adapters (AnnData → Cellucid data model)#
Adapters are the “server-side glue” that let Cellucid serve AnnData directly (without exporting first).
Most users never need to instantiate an adapter manually:
use
show_anndata()(notebook) orserve_anndata()/cellucid serve …(browser tab)
If you’re debugging, extending, or integrating Cellucid into custom servers, AnnDataAdapter is the primary public adapter.
Fast path (for developers)#
from cellucid import AnnDataAdapter
adapter = AnnDataAdapter(adata) # in-memory
# or: adapter = AnnDataAdapter.from_file("data.h5ad")
identity = adapter.get_dataset_identity()
obs_manifest = adapter.get_obs_manifest()
print(identity.get("name"), len(obs_manifest.get("fields", [])))
adapter.close()
Practical path (what an adapter does)#
It emulates the exported on-disk format#
The web viewer expects files like:
dataset_identity.jsonobs_manifest.jsonpoints_2d.bin,points_3d.binvar/<gene>.values.f32.bin
In AnnData mode, the adapter serves these as virtual endpoints computed from AnnData on demand.
Lazy loading behavior (important for large datasets)#
.h5adcan be served in backed mode so gene expression columns are fetched on demand..zarris inherently chunked/lazy.In-memory AnnData uses whatever you already loaded into RAM.
API reference#
- class cellucid.AnnDataAdapter(adata, latent_key=None, gene_id_column='index', normalize_embeddings=True, centroid_outlier_quantile=0.95, centroid_min_points=10, dataset_name=None, dataset_id=None)[source]#
Bases:
objectAdapter that wraps AnnData and provides data in Cellucid format.
This adapter generates all the data that would normally be created by prepare, but reads directly from AnnData without creating intermediate files. This is slower but more convenient for interactive use.
- Parameters:
- __init__(adata, latent_key=None, gene_id_column='index', normalize_embeddings=True, centroid_outlier_quantile=0.95, centroid_min_points=10, dataset_name=None, dataset_id=None)[source]#
Initialize the adapter.
- Parameters:
adata (AnnData) – AnnData object to adapt. Can be in-memory or backed (h5ad file).
latent_key (str, optional) – Key in obsm for latent space used for outlier quantile calculation. If None, attempts to find: ‘X_pca’, ‘X_scvi’, ‘scanvi’, ‘scvi’, or first obsm.
gene_id_column (str) – Column in var for gene identifiers. Use “index” for var.index.
normalize_embeddings (bool) – If True, normalize embeddings to [-1, 1] range (recommended).
centroid_outlier_quantile (float) – Quantile for outlier removal in centroid computation.
centroid_min_points (int) – Minimum points per category for centroid computation.
dataset_name (str, optional) – Human-readable dataset name.
dataset_id (str, optional) – Dataset identifier.
- classmethod from_file(path, backed='r', **kwargs)[source]#
Create adapter from h5ad file or zarr store with lazy loading.
Supports both: - .h5ad files: HDF5-based, supports true backed mode with memory-mapping - .zarr directories: Directory-based, arrays are loaded on-demand
Lazy Loading Behavior#
- h5ad (backed mode):
When backed=’r’, the file is memory-mapped. Only accessed data is loaded into RAM. This is ideal for large datasets. The X matrix and layer arrays support lazy column/row access.
- zarr:
Zarr stores individual arrays as separate files on disk. While anndata.read_zarr() loads the AnnData structure (obs, var metadata), the actual X matrix data is loaded lazily when accessed. This is because zarr’s internal chunking mechanism defers loading until data is requested. Note: zarr does not support the same backed mode API as h5ad, but achieves similar lazy behavior through its design.
- type path:
- param path:
Path to h5ad file or zarr directory. - For h5ad: path/to/file.h5ad - For zarr: path/to/store.zarr (must be a directory)
- type path:
str or Path
- type backed:
- param backed:
For h5ad only: - ‘r’: Read-only backed mode (recommended for visualization) - ‘r+’: Read-write backed mode - True: Same as ‘r’ - False: Load entire file into memory For zarr: This parameter is ignored (zarr is always lazy).
- type backed:
bool or ‘r’ or ‘r+’
- type **kwargs:
- param **kwargs:
Additional arguments passed to AnnDataAdapter.__init__: - latent_key: Key in obsm for latent space - gene_id_column: Column in var for gene IDs - normalize_embeddings: Normalize UMAP to [-1,1] - dataset_name: Human-readable name
- returns:
Adapter instance wrapping the loaded data.
- rtype:
AnnDataAdapter
- raises FileNotFoundError:
If the path does not exist.
- raises ValueError:
If the path is not a valid h5ad or zarr store.
Examples
>>> # Load h5ad with lazy loading (default) >>> adapter = AnnDataAdapter.from_file("data.h5ad") >>> # Load h5ad fully into memory >>> adapter = AnnDataAdapter.from_file("data.h5ad", backed=False) >>> # Load zarr store >>> adapter = AnnDataAdapter.from_file("data.zarr")
- Parameters:
- Return type:
- property is_backed: bool#
Whether the AnnData is backed (lazy loading from disk).
Returns False if the adapter is closed or if adata is None.
- get_embedding(dim)[source]#
Get embedding coordinates for a dimension.
Returns normalized Float32 array of shape (n_cells, dim).
- get_embedding_3d(dim)[source]#
Get embedding padded to 3D for WebGL rendering.
1D -> (x, 0, 0) 2D -> (x, y, 0) 3D -> (x, y, z)
- get_vector_field_binary(field_id, dim, compress=False)[source]#
Get a per-cell vector field (displacement vectors) as binary float32 data.
Vector fields are scaled by the SAME per-dimension normalization scale as the embedding points, so they are in the same normalized space as the points_{dim}d.bin responses.
- get_obs_field_kind(key)[source]#
Determine if an obs field is continuous or categorical.
Classification rules: - Categorical dtype → category - Boolean dtype → category - Numeric dtype → continuous - String/object → category (treated as labels) - Empty column → category (safe default)
- get_obs_continuous_values(key, compress=False)[source]#
Get continuous obs field as binary float32 data.
NaN/Inf values are preserved in the output (client handles visualization).
- get_obs_categorical_codes(key, compress=False)[source]#
Get categorical obs field as binary codes.
- Return type:
- Returns:
(binary_codes, category_list, missing_value)
- Parameters:
Categories are assigned codes 0 to n-1. Missing values (NaN) are encoded as the missing_value sentinel.
- get_obs_outlier_quantiles(key, compress=False)[source]#
Get outlier quantiles as binary float32 data.
- get_gene_expression(gene_id, compress=False)[source]#
Get expression values for a single gene as binary float32.
- close()[source]#
Close the adapter and release all resources.
This method: 1. Clears all caches to free memory (embedding, centroid, CSC, gene expression) 2. Closes the underlying file handle for backed h5ad files 3. Marks the adapter as closed to prevent further operations
Safe to call multiple times. Always call this method when done with the adapter, or use the context manager:
with AnnDataAdapter.from_file("data.h5ad") as adapter: # use adapter # automatically cleaned up
- Return type:
Memory Released#
Embedding cache (normalized UMAP coordinates)
Centroid cache (computed label centroids)
Outlier quantile cache
Gene expression LRU cache (up to 100 gene columns)
CSC matrix cache (for CSR->CSC converted matrices)
Latent space array
Gene ID lookup indices
Edge cases (do not skip)#
If your embedding keys are missing or have unexpected shapes, the adapter cannot serve
points_*d.bin.Duplicate gene IDs can make gene lookup ambiguous; prefer stable, unique identifiers.
If
adata.Xis CSR, the adapter may materialize a CSC copy for efficient column access (memory trade-off).
Troubleshooting (symptom → diagnosis → fix)#
Symptom: “Gene expression lookup is very slow”#
Fix:
Prefer serving a backed
.h5ador.zarrover in-memory dense matrices.For repeated access, export with
prepare()instead.
Symptom: “No embeddings detected”#
Fix:
Ensure you have an embedding in
adata.obsmwith a supported key (e.g.X_umap,X_umap_2d,X_umap_3d).
See also#
Server (browser tab + local HTTP server) for AnnData servers
Export / Data Preparation (prepare) for creating exported datasets