Analysis mode: Marker Genes (Genes Panel)#

Audience: everyone (especially computational users; wet lab-friendly for interpretation)
Time: 25–60 minutes
What you’ll learn:

How Marker Genes discovers one-vs-rest markers for many groups at once
How grouping works (categorical obs field → groups)
How to interpret log2FC, p-values/FDR, and percent-expressing
How Ranked vs Clustered vs Custom modes differ
How caching and performance settings affect runtime

Prerequisites:

A dataset loaded
Gene expression available
At least one categorical obs field that can define groups (e.g., cell_type or cluster)

What Marker Genes is for#

Marker Genes is the “many groups” marker discovery tool.

Use it when you want:

marker genes for each cell type/cluster in a categorical field,
a heatmap view of those markers across groups,
and an exportable marker table.

How it differs from DE:

Analysis mode: Differential Expression (DE) (Page A vs Page B) compares two pages (A vs B).
Marker Genes computes one-vs-rest markers for every group in a categorical obs field.

Inputs (what you choose)#

1) Group By (categorical obs field)#

You pick a categorical obs field under Group By: (e.g., cell_type).

Cellucid then builds groups as:

one group per category label (e.g., B cell, T cell, …),
across the full dataset (not highlight pages).

Important behaviors:

cells with missing/invalid category codes are excluded from grouping,
groups are sorted by size (largest first) for stable UI.

2) Mode#

Marker Genes supports three modes:

Ranked Genes: show ranked marker lists per group
Clustered: build a marker heatmap and cluster genes/groups
Custom Genes: skip marker discovery; visualize a user-supplied gene list across groups

3) Statistical method#

Wilcoxon (default): rank-based test, robust to outliers
t-test: Welch’s t-test (mean comparison)

4) Use cached results#

If enabled, Cellucid can reuse cached marker results for the same dataset + group-by field + settings. This can make repeated runs much faster.

5) Performance Settings (collapsible)#

Marker discovery can be heavy. The same performance controls used by DE apply here:

batch size, memory budget, network parallelism, compute parallelism, Wilcoxon bins.

Statistics (what is computed)#

Marker discovery is one-vs-rest per group.

For each group g and each gene:

“in-group” = cells in group g
“out-group” = all other cells (with valid group labels)

For each gene and group, Cellucid computes (conceptually):

meanInGroup, meanOutGroup
log2FoldChange = log2((meanInGroup + 0.01) / (meanOutGroup + 0.01))
pValue (Wilcoxon U or Welch t-test)
adjustedPValue via Benjamini–Hochberg (computed per group across genes)
percentInGroup, percentOutGroup = percent of cells with expression > 0

Markers are filtered by thresholds (typically controlled in the expanded view):

p-value/FDR threshold (default ~0.05)
|log2FC| threshold (default ~1.0)
and whether to use adjusted p-values by default

Outputs (what you see)#

Ranked Genes mode#

Select a group from a dropdown.
View the top markers for that group.
Use Expand (modal) for the full table and exports.

Clustered mode#

A heatmap of genes (rows) vs groups (columns).
Optional clustering of rows/columns (distance + linkage choices).
Plot options usually include:
- p-value threshold / log2FC threshold
- use adjusted vs raw p-values
- transform (e.g., z-score/log1p) and colorscale choices

Custom Genes mode#

You provide genes.
Cellucid builds an expression matrix for those genes across groups.
No p-values are computed (because this mode is visualization, not discovery).

Export (CSV)#

Marker Genes exports depend on what you’re viewing:

Heatmap CSV: gene column + one column per group (matrix values)
Ranked markers CSV: group,gene,rank,log2FoldChange,pValue,adjustedPValue,meanInGroup,meanOutGroup,percentInGroup,percentOutGroup

Use exports when you need:

reproducible reports,
downstream filtering in R/Python,
figure preparation outside the app.

Edge cases and pitfalls#

Small groups can block the whole run#

Marker discovery enforces a minimum group size (default ~10 cells). If any group is below the minimum, the run can fail.

Workarounds:

choose a different categorical field (coarser grouping),
merge rare categories into “Other” in preprocessing,
create a derived categorical field that excludes rare categories.

No markers found#

Common causes:

thresholds too strict,
groups are extremely similar,
gene expression scale is inappropriate for the test (e.g., already heavily transformed),
missing gene expression.

Caching confusion#

If “Use cached results” is enabled:

reruns may return instantly with the same markers. Disable caching if you suspect you changed the underlying dataset or want a fresh recompute.

Troubleshooting (Marker Genes)#

Symptom: “No categorical fields available”#

Cause:

dataset has no categorical obs annotations.

Fix:

export/load a dataset with cluster/cell type labels (obs categorical).

Symptom: “Group ‘X’ has only N cells. Minimum required: 10.”#

Cause:

at least one category is too small for robust marker discovery.

Fix:

merge rare categories or choose a different group-by field (see Edge cases above).

Symptom: “Analysis is very slow / browser becomes unresponsive”#

Fix:

reduce Performance Settings (lower batch size, lower parallelism) to avoid memory pressure,
keep fewer tabs/windows open,
consider server mode for large datasets (data loading + memory stability).

Screenshot placeholder (you will replace later)#

Marker Genes discovers one-vs-rest markers per group and visualizes them as ranked lists or a clustered heatmap.#

Next steps#

Analysis mode: Differential Expression (DE) (Page A vs Page B) (two-page DE with volcano plot)
Exporting analysis results (what each mode exports)