cellucid_prepare() / prepare() Overview#
Audience: everyone
Time: 10–20 minutes
Goal: understand what the exporter does and how to call it correctly
cellucid::cellucid_prepare() is the main entry point of cellucid-r. It writes an export folder containing binaries + JSON manifests that the Cellucid web app can load.
cellucid::prepare() is an alias for the same function.
Minimal “mental model”#
You provide arrays/data.frames.
Cellucid normalizes/encodes them for the viewer.
It writes files to
out_dir.You open
out_dirin the browser.
The function returns NULL (invisibly). It is called for side effects (writing files).
A minimal call (embeddings + obs only)#
cellucid::cellucid_prepare(
latent_space = latent, # (n_cells, n_latent_dims)
obs = obs, # data.frame with n_cells rows
X_umap_2d = umap2, # (n_cells, 2)
out_dir = "exports/my_dataset",
force = TRUE
)
What is required vs optional?#
Required#
latent_space(matrix-like,(n_cells, n_dims))obs(data.frame withn_cellsrows)at least one embedding (
X_umap_1dorX_umap_2dorX_umap_3d)
Optional (common)#
gene_expression+var(enables gene overlays/search)connectivities(enables graph-based features)vector_fields(enables velocity/displacement overlays)
Optional (performance/size knobs)#
compression(gzip level 1–9)var_quantization(8/16-bit for gene expression)obs_continuous_quantization(8/16-bit for continuous obs + outlier quantiles)gene_identifiers(export a subset of genes)obs_keys(export a subset of metadata columns)
Key behaviors you should know (before you export real data)#
1) Embeddings are normalized#
Each embedding is centered and scaled so it fits a stable range for rendering. Vector fields are scaled consistently with this normalization.
Details: Embeddings and Coordinates
2) latent_space is used for categorical “outlier quantiles”#
For each categorical field, cellucid-r computes a per-cell outlier quantile (distance rank inside its category in latent space). This requires latent_space.
Details: obs: Cell Metadata
3) Export can “silently skip” work unless force=TRUE#
By default force=FALSE. If manifests already exist, export may skip:
obs_manifest.jsonvar_manifest.jsonconnectivity_manifest.jsonand embedding/vector files that already exist
This is useful for incremental work, but it can be confusing when you expect outputs to change.
Rule of thumb:
use a new
out_dirfor each export iteration, orset
force=TRUEwhile you are iterating.
4) Continuous quantization uses a reserved missing marker#
If you quantize continuous values:
valid values map to
0..254(8-bit) or0..65534(16-bit)invalid values (
NA,Inf,-Inf) map to255/65535
Details: obs: Cell Metadata and Gene Expression Matrix
5) Gene expression export is dense-per-gene#
Even if your input expression matrix is sparse, each exported gene file is a dense vector of length n_cells. This is the main reason large exports can be huge.
Details and mitigation strategies: Gene Expression Matrix and Performance Tuning (Prepare/Export)
What files are written?#
At minimum (obs + embeddings):
points_*d.bin[.gz]obs_manifest.jsonobs/*dataset_identity.json
If you include gene expression:
var_manifest.jsonvar/*
If you include connectivities:
connectivity_manifest.jsonconnectivity/edges.src.bin[.gz]connectivity/edges.dst.bin[.gz]
If you include vector fields:
vectors/*.bin[.gz]metadata inside
dataset_identity.json
Full format spec: Export Directory Format (Specification)
Next steps#
Want the global alignment rules first? Input Requirements (Global)
Exporting from Seurat/SCE? Integrations & Recipes
Something failed? Troubleshooting: Prepare/Export