`cellucid_prepare()` Overview#

Audience: everyone
Time: 10–20 minutes
Goal: understand what the exporter does and how to call it correctly

cellucid::cellucid_prepare() is the main entry point of cellucid-r. It writes an export folder containing binaries + JSON manifests that the Cellucid web app can load.

Minimal “mental model”#

You provide arrays/data.frames.
Cellucid normalizes/encodes them for the viewer.
It writes files to out_dir.
You open out_dir in the browser.

The function returns NULL (invisibly). It is called for side effects (writing files).

A minimal call (embeddings + obs only)#

cellucid::cellucid_prepare(
  latent_space = latent,  # (n_cells, n_latent_dims)
  obs = obs,              # data.frame with n_cells rows
  X_umap_2d = umap2,      # (n_cells, 2)
  out_dir = "exports/my_dataset",
  force = TRUE
)

What is required vs optional?#

Required#

latent_space (matrix-like, (n_cells, n_dims))
obs (data.frame with n_cells rows)
at least one embedding (X_umap_1d or X_umap_2d or X_umap_3d)

Optional (common)#

gene_expression + var (enables gene overlays/search)
connectivities (enables graph-based features)
vector_fields (enables velocity/displacement overlays)

Optional (performance/size knobs)#

compression (gzip level 1–9)
var_quantization (8/16-bit for gene expression)
obs_continuous_quantization (8/16-bit for continuous obs + outlier quantiles)
gene_identifiers (export a subset of genes)
obs_keys (export a subset of metadata columns)

Key behaviors you should know (before you export real data)#

1) Embeddings are normalized#

Each embedding is centered and scaled so it fits a stable range for rendering. Vector fields are scaled consistently with this normalization.

Details: Embeddings and Coordinates

2) `latent_space` is used for categorical “outlier quantiles”#

For each categorical field, cellucid-r computes a per-cell outlier quantile (distance rank inside its category in latent space). This requires latent_space.

Details: obs: Cell Metadata

3) Export publication is atomic#

By default force=FALSE; an existing output generation is rejected in full. With force=TRUE, cellucid-r writes and validates one complete sibling stage, then atomically replaces the previous generation. A failed candidate leaves the previous generation unchanged.

Rule of thumb:

use a new out_dir for each export iteration, or
set force=TRUE only for an intentional complete replacement.

4) Continuous and gene values are finite-only#

If you quantize continuous values:

valid values map to 0..254 (8-bit) or 0..65534 (16-bit)
NA, NaN, and infinities reject the complete candidate
constant fields reject because compact quantization requires minValue < maxValue

The reserved 255/65535 marker is used only for missing categorical codes and generated NaN categorical outlier quantiles.

Details: obs: Cell Metadata and Gene Expression Matrix

5) Gene expression export is dense-per-gene#

Even if your input expression matrix is sparse, each exported gene file is a dense vector of length n_cells. This is the main reason large exports can be huge.

Details and mitigation strategies: Gene Expression Matrix and Performance Tuning (Prepare/Export)

What files are written?#

At minimum (obs + embeddings):

points_*d.bin[.gz]
obs_manifest.json
obs/*
dataset_identity.json

If you include gene expression:

var_manifest.json
var/*

If you include connectivities:

connectivity_manifest.json
connectivity/edges.src.bin[.gz]
connectivity/edges.dst.bin[.gz]
connectivity/edges.weights.f64.bin[.gz]

If you include vector fields:

vectors/*.bin[.gz]
metadata inside dataset_identity.json

Full format spec: Export Directory Format (Specification)

Next steps#

Want the global alignment rules first? Input Requirements (Global)
Exporting from Seurat/SCE? Integrations & Recipes
Something failed? Troubleshooting: Prepare/Export

cellucid_prepare() Overview#