Jupyter Integration (Notebook Embedding)#
This tutorial shows how to load data into Cellucid from inside a notebook (JupyterLab, classic Jupyter, VSCode notebooks, etc.).
You will learn:
how to display a pre-exported dataset with
show()(fastest + most reproducible)how to display an AnnData /
.h5ad/.zarrdirectly withshow_anndata()(most convenient for analysis)how to work with vector fields (velocity/drift overlays) in notebooks
how the integration works under the hood (so you can debug it)
how to drive the viewer from Python (highlight/color/visibility) and react to UI events (hooks)
If you are not in a notebook environment, start with Server Mode (CLI + Python) — Recommended for Large Datasets.
At A Glance#
Audience
Wet lab / beginner: copy/paste the “Minimal cells” sections and focus on “What success looks like”.
Computational users: focus on backed mode, dataset sizes, vector fields, and cleanup.
Power users: focus on remote/HPC workflows, hooks, and debugging endpoints.
Time
Minimal working embed: ~5 minutes
Full read (hooks + troubleshooting): ~20–30 minutes
Prerequisites
A Jupyter environment (classic notebook, JupyterLab, or VSCode notebooks)
pip install cellucid
Important
Network requirement (important): Cellucid serves the viewer UI via a hosted-asset proxy.
On first run (or after the website changes), Cellucid downloads the viewer UI (
index.html+/assets/*) fromhttps://www.cellucid.comand caches it on disk.In notebooks, Cellucid prefetches the UI cache with a clear progress bar (so beginners aren’t left staring at a blank iframe).
Cache invalidation is driven by the web app’s
<meta name="cellucid-web-build-id" ...>stamp.Notebook embeds then load the UI from your local Cellucid server (either direct loopback or via a notebook proxy), avoiding HTTPS→HTTP mixed-content blocking.
If you are offline and no cached UI is available, the iframe will show a “Cellucid viewer UI could not be loaded” page with next steps.
In Jupyter mode, Cellucid intentionally does not fall back to demo data if it cannot reach the Python-side server; it shows a connectivity error so you don’t accidentally analyze the wrong dataset.
Configure the cache location with CELLUCID_WEB_PROXY_CACHE_DIR.
Clear the cache (force a re-download) with cellucid.clear_web_cache() or viewer.clear_web_cache().
Manually prefetch the full UI asset cache with viewer.ensure_web_ui_cached() (usually not needed).
Minimal Cells (Copy/Paste)#
If you just want it to work, copy/paste one of these flows and then come back for details.
Minimal: show an in-memory AnnData#
from cellucid import show_anndata
viewer = show_anndata(adata, height=600)
viewer # (optional) display again in some notebook UIs
Minimal: show a .h5ad / .zarr (recommended for large datasets)#
from cellucid import show_anndata
viewer = show_anndata("data.h5ad", height=600) # backed mode by default
# viewer = show_anndata("data.zarr", height=600)
Minimal: show a pre-exported dataset directory#
from cellucid import show
viewer = show("./exports/pbmc_demo", height=600)
Note
If you don’t have an export directory yet, create one with cellucid.prepare(...).
See Local & Remote Demo (Share Without Running a Server) for a complete export workflow.
How It Works (Mental Model)#
When you call show(...) or show_anndata(...), Cellucid does two things:
Starts a local data server (usually
127.0.0.1:<some_port>)This server reads your data and exposes a small HTTP API (e.g.
/points_3d.bin,/obs_manifest.json,/dataset_identity.json).The server is intentionally localhost-bound in Jupyter mode for safety (it is not meant to be public).
Displays an iframe in your notebook pointing at the same server:
Local notebooks often use direct loopback:
http://127.0.0.1:<port>/?jupyter=true&viewerId=<id>&viewerToken=<token>
HTTPS/remote notebooks use Jupyter Server Proxy (recommended):
https://<notebook-origin>/<base>/proxy/<port>/?jupyter=true&viewerId=<id>&viewerToken=<token>
The viewer UI and the dataset API share the same origin, so the viewer loads data from relative paths (no mixed-content).
Why this matters#
If your notebook kernel is local,
127.0.0.1:<port>is your laptop and everything “just works”.In Google Colab, the kernel runs on a remote VM; Cellucid uses Colab’s built-in HTTPS port proxy so the embed still works (your
viewer.viewer_urlwon’t look like127.0.0.1in that case).If your notebook is served from a remote/HTTPS Jupyter server (common on JupyterHub), Cellucid embeds via Jupyter Server Proxy so the browser can still reach the kernel-side server port.
If your kernel is remote but your notebook frontend cannot use a server proxy (e.g. unusual VSCode/webview setups), you may still need SSH port forwarding (see Remote / HPC Notebooks (SSH Tunneling Guide)).
Debugging endpoints you can open in a browser#
Once viewer exists:
print(viewer.server_url) # e.g. http://127.0.0.1:8765
print(viewer.viewer_url) # the embedded viewer URL (usually same as server_url + query params)
Then try:
http://127.0.0.1:<port>/_cellucid/health(server alive?)http://127.0.0.1:<port>/dataset_identity.json(dataset id + vector fields metadata)
Choose show() vs show_anndata()#
Function |
Best for |
What you pass |
Performance |
|---|---|---|---|
|
Fast, reproducible viewing |
a pre-exported directory |
Best |
|
Convenience in analysis workflows |
|
Good (but slower than exports) |
If you’re preparing a dataset for collaborators or repeated viewing, prefer prepare() + show().
Option #12 — show() (Pre-exported Dataset)#
Step 1 — Create an export (one-time)#
Use cellucid.prepare(...) to create an export directory.
For a complete workflow (including GitHub sharing), see Local & Remote Demo (Share Without Running a Server).
Step 2 — Show it in a notebook#
from cellucid import show
viewer = show("./exports/pbmc_demo", height=600)
What success looks like#
An interactive Cellucid viewer appears directly in the output cell.
You can pan/zoom/select cells.
If your export includes vector fields, the overlay can be enabled (see Vector Fields in Notebooks (Velocity/Drift Overlay)).
Advanced: choose a fixed port (useful for SSH tunneling)#
The convenience function show(...) auto-picks a port. If you need a fixed port:
from cellucid import CellucidViewer
viewer = CellucidViewer("./exports/pbmc_demo", port=8765, height=600)
viewer.display()
Note
Fixed ports are especially useful for remote notebooks, because you can pre-configure an SSH tunnel to 127.0.0.1:8765.
from cellucid import show
# Example (replace with your path):
# viewer = show("./exports/pbmc_demo", height=600)
Option #13/#14 — show_anndata() (AnnData / .h5ad / .zarr)#
show_anndata() is the fastest way to get started in an analysis notebook.
It supports:
an in-memory
AnnDataa
.h5adfile (opened in backed mode by default for lazy loading)a
.zarrdirectory (chunked storage)
Minimal examples#
from cellucid import show_anndata
viewer = show_anndata(adata)
viewer = show_anndata("data.h5ad")
viewer = show_anndata("data.zarr")
Useful kwargs#
show_anndata(..., **kwargs) forwards extra options to the AnnData adapter (and to the viewer constructor when relevant):
port: fix the local server port (critical for remote/HPC; see Remote / HPC Notebooks (SSH Tunneling Guide))latent_key: choose the latent space inobsm(auto-detected if omitted)gene_id_column: whichvarcolumn to use as gene IDs (default: index)normalize_embeddings: normalize coordinates to[-1, 1](default: True)dataset_name: label shown in the UIdataset_id: set a stable dataset identity string (important for sessions; see Dataset identity (why it matters))
Tip
If you plan to use Cellucid sessions or share “the same dataset” across runs, set a stable dataset_id instead of relying on auto-generated IDs.
from cellucid import show_anndata
# In-memory AnnData
# viewer = show_anndata(adata, height=600)
# File-backed (recommended for large datasets)
# viewer = show_anndata("/path/to/data.h5ad", height=600)
# viewer = show_anndata("/path/to/data.zarr", height=600)
# With options
# viewer = show_anndata(
# "data.h5ad",
# height=700,
# latent_key="X_pca",
# gene_id_column="gene_symbols",
# dataset_name="PBMC demo",
# )
Vector Fields in Notebooks (Velocity/Drift Overlay)#
Vector fields are optional per-cell displacement vectors (e.g. RNA velocity, drift, directed transitions) that Cellucid can render as an animated overlay on top of your embedding.
This matters for data loading because:
the overlay is only available if the server advertises
vector_fieldsindataset_identity.jsonvector fields must be aligned to the same cells and same embedding basis/dimension
Quick checklist (AnnData)#
To make a vector field appear when using show_anndata(...), you need:
A UMAP embedding in
adata.obsm(X_umap_2d/X_umap_3dor compatibleX_umap)A vector field in
adata.obsmwith a Cellucid-compatible keyThe vector array shape must be
(n_cells, dim)wheredimmatches the embedding (2 or 3)
Naming convention (UMAP basis)#
Cellucid detects vector fields in adata.obsm using keys like:
Explicit (preferred):
velocity_umap_2d(shape(n_cells, 2))velocity_umap_3d(shape(n_cells, 3))T_fwd_umap_2d(shape(n_cells, 2))
Implicit (allowed, but explicit is clash-safe):
velocity_umap(shape(n_cells, 2)or(n_cells, 3))
For full expectations (including exported-folder layout), see Folder / file format expectations (high-level; link to spec).
Minimal example: attach a 2D vector field#
import numpy as np
n = adata.n_obs
adata.obsm["velocity_umap_2d"] = np.zeros((n, 2), dtype=np.float32) # replace with real vectors
viewer = show_anndata(adata)
Example: compute a drift field from a transition matrix (CellRank-style)#
If you have a transition matrix T and UMAP coordinates, Cellucid ships helpers:
from cellucid import add_transition_drift_to_obsm
# Adds e.g. "T_fwd_umap_2d" into adata.obsm (key depends on dim/basis)
out_key = add_transition_drift_to_obsm(adata, T, basis="umap", field_prefix="T_fwd")
print("Wrote:", out_key)
viewer = show_anndata(adata)
Verify that the server is advertising vector fields#
After viewer = show_anndata(...):
import json, urllib.request
with urllib.request.urlopen(viewer.server_url + "/dataset_identity.json") as f:
ident = json.load(f)
print("Has vector_fields?", "vector_fields" in ident)
If the overlay is missing in the UI and vector_fields is absent:
double-check your
adata.obsmkeys and shapesconfirm you have a matching embedding dimension (2D vs 3D)
see Troubleshooting (data loading) and Troubleshooting (velocity overlay)
Programmatic Control (Python → Viewer)#
Once you have a viewer, you can drive UI state from Python.
Common actions:
# Highlight a few cells (indices are 0-based row indices into your dataset)
viewer.highlight_cells([0, 10, 42], color="#ff0000")
# Color by an obs column (must exist in the dataset)
viewer.set_color_by("cell_type")
# Hide some cells (or set visible=True to show them again)
viewer.set_visibility([0, 10, 42], visible=False)
# Reset camera
viewer.reset_view()
Important
Cell indices refer to the row order Cellucid is serving.
For
show_anndata(adata), this is the currentadatarow order.If you subset/shuffle
adatain Python, the indices will change.
Reacting to the UI (Hooks: Viewer → Python)#
The viewer can send events back to Python so your notebook can react to selection/hover/click.
Supported hooks:
@viewer.on_ready@viewer.on_selection@viewer.on_hover@viewer.on_click@viewer.on_message(raw debugging)
Minimal: print selections#
@viewer.on_selection
def handle_selection(event):
print("Selected:", len(event["cells"]))
Practical: analyze the selected cells#
@viewer.on_selection
def analyze(event):
cells = event["cells"]
subset = adata[cells].copy()
print(subset)
# e.g. run Scanpy plots or downstream analysis on `subset`
Debug: print all messages#
@viewer.on_message
def debug(event):
print(event)
For more about hooks, see Jupyter Hooks System (Python ↔ Frontend).
Pulling State into Python (No-Download Sessions)#
Hooks are great when you want reactive code. For “pull-style” workflows, Cellucid also exposes:
viewer.state (live snapshot)#
viewer.state is a small, thread-safe snapshot of the latest events:
viewer.wait_for_ready(timeout=60)
print(viewer.state.selection) # last selection event (or None)
print(viewer.state.hover) # last hover event (or None)
print(viewer.state.click) # last click event (or None)
Session bundle (durable saved state → AnnData)#
In Jupyter, you can request the current .cellucid-session as a Python object:
viewer.wait_for_ready(timeout=60)
bundle = viewer.get_session_bundle(timeout=60)
# Apply to AnnData (adds obs/var columns; stores metadata in adata.uns["cellucid"])
adata2 = bundle.apply_to_anndata(adata, inplace=False)
Convenience one-liner:
adata2 = viewer.apply_session_to_anndata(adata, inplace=False)
Important
Session application is currently index-based (cell identity is the row position).
Only apply a session to an AnnData whose row order matches the dataset that produced the session.
Debugging: viewer.debug_connection()#
If hooks/session capture seem “stuck”, run:
report = viewer.debug_connection()
report
This checks server endpoints (/_cellucid/health, /_cellucid/info), performs a ping/pong roundtrip, and includes recent frontend console warnings/errors forwarded to Python.
It also includes a frontend “debug snapshot” (the iframe’s location.href, origin, and user agent), which is useful in proxied notebook environments.
In Jupyter, Cellucid appears as an interactive iframe inside the notebook output.#
Cleanup (Do This If You Re-run Cells Often)#
Each viewer starts a local server in the background.
If you create many viewers and never stop them, you can accumulate background servers.
If you re-run a notebook cell repeatedly, you may see port increments (8765, 8766, 8767, …).
Recommended pattern#
viewer = show_anndata(...)
try:
# ... use it ...
pass
finally:
viewer.stop() # stop server + cleanup
Note
viewer.stop() freezes the viewer frame (best-effort) so the notebook output stays visually identical to the last interactive state, but becomes non-interactive.
Stop everything created in this kernel#
from cellucid.jupyter import cleanup_all
cleanup_all()
# Stop a single viewer
# viewer.stop()
# Or stop all viewers created in this session
# from cellucid.jupyter import cleanup_all
# cleanup_all()
Remote / HPC Notebooks (SSH Tunneling Guide)#
If your kernel runs on a remote machine (HPC/JupyterHub/cloud VM) but your browser is on your laptop, you must ensure the browser can reach the remote kernel’s Cellucid server.
The robust solution is SSH local port forwarding:
Pick a port you will use for the Cellucid data server (example:
8765).Start the viewer with that port in the notebook:
from cellucid import show_anndata viewer = show_anndata("data.h5ad", port=8765, height=600) print(viewer.viewer_url)
On your laptop, create an SSH tunnel that forwards that same local port to the remote machine:
ssh -N -L 8765:127.0.0.1:8765 <user>@<remote-host>
Now, when your browser loads http://127.0.0.1:8765/?jupyter=true&...,
it hits your laptop’s 127.0.0.1:8765, which SSH forwards to the remote kernel’s Cellucid server.
Important
If you do not set a fixed port, Cellucid will pick the first available port (8765, 8766, …). That makes remote tunneling awkward because you need to update your SSH forwarding every time.
Common remote variants#
You already SSH-tunnel your Jupyter server: you can forward both ports in one command (example ports shown):
ssh -N \\ -L 8888:127.0.0.1:8888 \\ -L 8765:127.0.0.1:8765 \\ <user>@<remote-host>
VSCode Remote / Remote-SSH: use VSCode port forwarding for the Cellucid port as well (the viewer still needs it on
localhost).JupyterHub: you typically still need a localhost-reachable port; prefer using a fixed port and ask your admin if extra forwarding is needed.
Common Edge Cases#
No internet access: the first time the viewer UI loads (or after a web UI update), the server must fetch
https://www.cellucid.com/index.html+/assets/*. If the cache is empty, the iframe shows a “what to do” page; run once while online (or use a persistentCELLUCID_WEB_PROXY_CACHE_DIR).Notebook blocks iframes (security policy): you may need to open
viewer.viewer_urlin a new browser tab.Port exhaustion: if many ports are in use, Cellucid may fail to find a free one.
Corporate proxies / ad blockers: can block cross-origin requests or event POSTs (hooks).
Huge in-memory
AnnData: can exhaust kernel RAM; prefer.h5adbacked mode or.zarr.
Troubleshooting (Massive)#
This section is intentionally redundant and explicit: it is designed for “I need to fix this now”.
Symptom: “The viewer doesn’t appear (blank output cell)”#
Likely causes (ordered)
You are not actually running in a Jupyter environment (e.g. plain Python script).
The notebook blocks iframes (security policy).
The viewer UI assets could not be loaded (hosted-asset proxy blocked and no cached copy available).
How to confirm
Print the viewer URL:
print(viewer.viewer_url)
Open that URL in a normal browser tab.
Fix
Ensure you are using Jupyter/JupyterLab/VSCode notebooks.
If you have outbound HTTPS, confirm the kernel/runtime can reach
https://www.cellucid.com(used by the hosted-asset proxy).If you are offline, run once while online to populate the hosted-asset proxy cache (and consider setting
CELLUCID_WEB_PROXY_CACHE_DIRto a persistent path).If iframes are blocked, open the URL manually in a new tab.
Symptom: “The viewer loads, but it says it cannot connect / everything is empty”#
Likely causes
The local data server is not reachable from your browser.
common in remote/HPC notebooks without tunneling
also happens if you used a port that is blocked locally
How to confirm
Open
viewer.server_url + "/_cellucid/health"in a browser.Check that you get JSON back (status ok).
Fix
Local notebook: restart the kernel and re-run, then try again.
Remote/HPC: follow Remote / HPC Notebooks (SSH Tunneling Guide) (set a fixed port + SSH forward it).
Symptom: “Port already in use / it keeps picking new ports”#
Likely causes
Old viewers still running from earlier cells.
Some other process is using the default port range.
Fix
Call
viewer.stop()when done.Or run
cleanup_all().Restart the kernel if needed.
If you need a stable port, set
port=...onshow_anndata(...)(or useCellucidViewer(..., port=...)).
Symptom: “show_anndata says no UMAP embeddings”#
Likely causes
Your AnnData lacks
obsm['X_umap_2d']/X_umap_3d(or compatibleX_umap).
How to confirm
print(adata.obsm.keys())
Fix
Compute UMAP and store it under one of the supported keys.
Symptom: “Vector field overlay is missing (no toggle / no fields)”#
Likely causes
Your vectors are not stored under a detected key (e.g. missing
_umapsuffix).Shape mismatch: vectors are
(n_cells, 2)but you are viewing in 3D (or vice versa).You have vectors, but they don’t match
adata.n_obs(filtered/shuffled mismatch).
How to confirm
Inspect
dataset_identity.json:import json, urllib.request with urllib.request.urlopen(viewer.server_url + "/dataset_identity.json") as f: ident = json.load(f) print(ident.get("vector_fields"))
Fix
Follow the naming rules in Vector Fields in Notebooks (Velocity/Drift Overlay) (and Folder / file format expectations (high-level; link to spec)).
If you have both 2D and 3D points, make sure you provide the matching vector dimension.
Symptom: “Gene search returns nothing / wrong gene IDs”#
Likely causes
Your
var_namesare Ensembl IDs but you’re searching symbols (or vice versa).Your gene IDs are in a different
varcolumn.
Fix
Pass
gene_id_column="..."toshow_anndata().Or export with
prepare(var_gene_id_column="...")and useshow().
Symptom: “Hooks don’t fire (selection events never reach Python)”#
Likely causes
Requests from the viewer to
/_cellucid/eventsare blocked by a proxy/ad blocker.The local server is unreachable from the browser (hooks need HTTP POST).
How to confirm
Register a raw handler:
@viewer.on_message def debug(event): print(event)
Open browser devtools → Network and look for requests to
/_cellucid/events.
Fix
Ensure the data server is reachable (
/_cellucid/healthworks).Temporarily disable extensions/ad blockers for
cellucid.com.
Symptom: “It’s slow compared to pre-exported data”#
Explanation
Direct AnnData mode is designed for convenience, not maximum speed.
Fix
Export once with
prepare()and useshow().For large
.h5ad, prefer.zarrwhen feasible.
Next Steps#
Stable dataset identity + sessions: Dataset identity (why it matters)
File/key expectations (including vector fields): Folder / file format expectations (high-level; link to spec)
General loading troubleshooting: Troubleshooting (data loading)
Vector field overlay usage: Vector Field / Velocity Overlay (GPU Particle Overlay)
Hooks deep dive: Jupyter Hooks System (Python ↔ Frontend)
If you want to share datasets publicly: export + GitHub workflow (Local & Remote Demo (Share Without Running a Server))