Analysis mode: Differential Expression (DE) (Page A vs Page B)#

Audience: computational users + wet lab scientists doing marker discovery
Time: 25–60 minutes
What you’ll learn:

How DE in Cellucid is defined (it compares two highlight pages)
The exact statistics used (Wilcoxon/Mann–Whitney U or Welch’s t-test)
How log2 fold change is computed (including pseudocount)
How to read the volcano plot + top-genes table without common mistakes
How to tune performance settings for large datasets

Prerequisites:

A dataset loaded
Gene expression available (var / genes)
At least one highlight page (you need two “page options” to compare)

What DE is for (in Cellucid)#

Differential Expression (DE) answers:

“Which genes differ between Page A and Page B?”

Where:

Page A and Page B are highlight pages (or derived “Rest of <page>” wildcards).

Common uses:

validate that a selected group has the expected marker genes,
generate a first-pass marker list for follow-up analysis,
sanity-check that an annotation is consistent with expression.

DE in Cellucid is exploratory:

it does not fit covariate models (batch, donor, condition),
it does not replace a full DE pipeline when you need complex designs,
it assumes your page definitions are meaningful groups.

Inputs (how you pick A and B)#

In Analysis → Differential Expression, you select:

Page A: (dropdown)
Page B: (dropdown)

If you have many pages, the selector shows:

base pages under Pages
derived options under Wildcards, including Rest of <page>

What “Rest of <page>” means here#

Rest of Page A means “all cells not in Page A”. This enables one-vs-rest DE without manually building a background page.

Filtering note#

DE uses page membership (stored cell indices), not canvas visibility. If you filtered cells out after creating a page, they may still be included in DE. See Analysis mental model (Pages, Variables, Scope) for the visibility vs membership distinction.

Minimum group size#

DE requires enough cells in both groups:

the implementation enforces a minimum of 10 valid cells per group (default).
if either side is below the minimum, genes may show as invalid/NaN.

Statistics (exact implementation)#

Cellucid supports two statistical methods:

1) Wilcoxon (default) = Mann–Whitney U#

This is a rank-based, non-parametric two-sample test.
It compares the distribution of expression values between the two groups.

Implementation notes:

Cellucid computes the U statistic from a full, tie-aware rank pass over the two selected groups.
The U statistic is not histogram-binned or sampled.
The p-value is exact when both comparison groups contain fewer than 50 finite values and there are no ties.
Otherwise, the p-value uses the exact U statistic with a tie- and continuity-corrected asymptotic reference distribution.

Practical implication:

large groups retain the exact U statistic, but use asymptotic inference and can require substantially more compute time than Welch’s t-test.

2) t-test = Welch’s t-test#

Parametric test comparing means.
Uses Welch’s formulation (does not assume equal variances).

This can be faster than Wilcoxon on some datasets, but is less robust to heavy non-normality/outliers.

For analysis-wide scope and assumptions, see Analysis mental model (Pages, Variables, Scope).

Effect size: log2 fold change (log2FC)#

For each gene, Cellucid computes:

log2FC = log2((meanA + 0.01) / (meanB + 0.01))

Where:

meanA is the mean expression in Page A
meanB is the mean expression in Page B
0.01 is a fixed pseudocount to avoid division by zero

Interpretation:

log2FC > 0 → gene is higher in Page A
log2FC < 0 → gene is higher in Page B

Important

The sign depends on the A/B ordering.

If the biology you expect is “markers of Page B”, consider swapping A and B so that “upregulated” corresponds to Page B.

Multiple testing correction (FDR)#

Cellucid computes a Benjamini–Hochberg adjusted p-value per gene:

this is exposed as adjustedPValue and visualized when “Use FDR-adjusted p-values” is enabled in the volcano plot options.

Default behavior:

the volcano plot uses adjusted p-values by default,
and many summary counts are framed as “FDR < 0.05”.

Outputs (what you see)#

Summary stats (quick sanity check)#

You’ll see:

Genes tested
Significant (FDR < 0.05)
Upregulated
Downregulated

Upregulated/downregulated are defined by the sign of log2FC (Page A relative to Page B).

Volcano plot#

Axes:

x-axis: log2 fold change (A vs B)
y-axis: -log10(p) (raw or adjusted, depending on the toggle)

Coloring (conceptual):

significant up (positive log2FC + passes thresholds)
significant down (negative log2FC + passes thresholds)
not significant (fails thresholds)

Threshold controls (in the plot options):

p-value threshold (0.001 / 0.01 / 0.05 / 0.1)
log2FC threshold (slider, default 1.0 = 2-fold)
use adjusted vs raw p-values
label top N genes
point size and other rendering toggles

Performance settings (when DE is slow)#

DE is gene-heavy: it can touch thousands of genes and hundreds of thousands of cells. Use Performance Settings (collapsible) to tune:

Batch size: how many genes to preload ahead of compute (higher can be faster but uses more memory).
Memory budget: limits how many gene vectors can be in flight without crashing the browser.
Network parallelism: concurrent gene loads (relevant for server/remote loading).
Compute parallelism: number of concurrent genes computed (bounded by worker pool size and memory).

Recommended workflow for very large datasets:

start with smaller pages (more targeted groups),
consider t-test if Wilcoxon is too slow,
export results and do full DE offline for publication workflows.

Edge cases and “gotchas”#

Pages overlap (same cell in A and B): DE is not meaningful; fix page definitions.
Tiny pages: unstable means and unreliable p-values; many genes will fail minimum cell checks.
All-zero/constant genes: log2FC ~ 0 and p-values ~ 1 (expected).
Expression scale ambiguity: DE uses whatever scale you exported (counts vs log1p vs normalized).
Missing gene expression: mode will fail or be empty if var/gene data is unavailable.

Troubleshooting (DE)#

Symptom: “Need at least 1 page for comparison”#

Cause:

you don’t have enough pages/wildcards to populate A and B.

Fix:

create highlight pages in Highlighted Cells (see Highlighting and Selection (Groups, Pages, Tools)),
or use Rest of \<page\> once you have at least one base page.

Symptom: “DE is extremely slow”#

Likely causes:

huge dataset (n_cells) and/or many genes,
exact Wilcoxon ranking costs,
insufficient memory budget leading to throttling.

How to confirm:

try a smaller page (fewer cells),
switch method to t-test and compare runtime,
watch the progress phases; if it stalls on loading, it’s an I/O issue.

Fix:

reduce scope (smaller pages),
adjust Performance Settings (lower batch size and/or memory budget to avoid crashes; increase cautiously for speed),
prefer offline DE for publication-scale runs.

Symptom: “Volcano plot looks wrong / missing points”#

Common causes:

thresholds hide most points (p-value threshold or log2FC threshold),
adjusted p-values are much larger than raw (expected after BH),
many genes have NaN due to insufficient valid cells.

Fix:

lower log2FC threshold,
switch between adjusted and raw p-values to understand correction effects,
increase group sizes / fix page overlap.

Next steps#

Analysis mode: Gene Signature (Gene Signature Score) (score a curated gene set)
Exporting analysis results (export DE tables/plots)
Troubleshooting (analysis) (DE slow, volcano missing points)

Analysis mode: Differential Expression (DE) (Page A vs Page B)#

What DE is for (in Cellucid)#

Inputs (how you pick A and B)#

What “Rest of <page>” means here#

Filtering note#

Minimum group size#

Statistics (exact implementation)#

1) Wilcoxon (default) = Mann–Whitney U#

2) t-test = Welch’s t-test#

Effect size: log2 fold change (log2FC)#

Multiple testing correction (FDR)#

Outputs (what you see)#

Summary stats (quick sanity check)#

Volcano plot#

Top genes table (sidebar convenience)#

Performance settings (when DE is slow)#

Edge cases and “gotchas”#

Troubleshooting (DE)#

Symptom: “Need at least 1 page for comparison”#

Symptom: “DE is extremely slow”#

Symptom: “Volcano plot looks wrong / missing points”#

Next steps#