Analysis mode: Gene Signature (Gene Signature Score)#
Audience: wet lab + computational users (gene program scoring)
Time: 20–40 minutes
What you’ll learn:
What a gene signature score represents (in Cellucid terms)
How Cellucid computes the score (mean/sum over genes; per-cell)
How normalization options change interpretation (z-score/min-max)
How to avoid the most common signature failures (missing genes, formatting, scale confusion)
Prerequisites:
A dataset loaded
Gene expression available
A gene list (“signature”) to score
What a gene signature is (user-facing)#
A gene signature is a curated gene list meant to represent a biological state, pathway, or cell program.
Cellucid turns a signature into a per-cell score:
one number per cell,
which can be compared across highlight pages (groups).
Intuition for wet lab users:
a high score means “many genes in the program are high (on the dataset’s expression scale)”,
a low score means “the program is not expressed strongly in these cells”.
Inputs (genes + pages)#
1) Signature genes input#
In Analysis → Gene Signature, you paste genes into Signature Genes.
Important formatting rule:
Gene lists are parsed as comma-separated values (e.g.,
CD3E, CD4, IL7R).
Important
Newline-only lists are not reliably parsed.
If you paste one gene per line, also include commas or convert to comma-separated format.
Gene matching rules (practical):
matching is exact to the dataset’s gene keys,
no alias mapping is applied (e.g., symbol ↔ Ensembl),
case sensitivity depends on your dataset’s gene keys (treat it as case-sensitive).
2) Pages (“Compare pages”)#
You select pages under Compare pages:. By default, if you have pages and haven’t selected anything, the UI will often start by selecting all pages.
Gene Signature supports derived pages:
Rest of <page> for one-vs-rest comparisons.
Scoring algorithm (exact)#
For each selected page, Cellucid computes a score for each cell in that page.
Let G be the set of genes you entered.
For each cell i:
collect expression values
x_{i,g}for all genesg ∈ Gthat exist and have finite valuescompute:
Mean expression:
score_i = mean_g(x_{i,g})over available genesSum expression:
score_i = sum_g(x_{i,g})over available genes
Notes:
Genes missing from the dataset are skipped.
If a cell has zero valid gene values (e.g., all genes missing), its score becomes NaN and is excluded from summary statistics/plots.
Note
Signature scores are computed on the expression values present in your dataset.
If your dataset stores log-normalized expression, the score is on that scale. If your dataset stores counts, the score is on the count scale.
About the “Median expression” option#
The UI exposes a “Median expression” option, but the current backend aggregation is mean/sum-based. Until this is updated, treat “Median” as experimental and prefer Mean expression for reproducibility.
Normalization options (what they do)#
After scoring, you can normalize the scores:
None: keep scores as computed
Z-score: transform scores to
(x - μ) / σusing μ and σ computed across all selected pages combinedMin-Max (0–1): transform scores to
(x - min) / (max - min)using global min/max across all selected pages combined
Practical implication:
normalization makes scores more comparable across pages within the current analysis run,
but it also changes the meaning of “high score” (especially z-score).
Outputs and interpretation#
You’ll typically see:
A distribution plot per page (violin/box/histogram depending on your selection)
A per-page summary table (mean/median/std + number of cells with valid scores)
A “Genes in Signature” list (chips) so you can verify inputs
Interpretation guidance:
Compare effect size (how separated are distributions) before staring at p-values.
If two pages differ strongly, check whether the signature is really a program or just a proxy for cell type composition.
Statistical tests:
When you have ≥2 pages selected, the modal can show the same style of distribution-comparison tests used for continuous variables in Detailed mode. Treat these as exploratory.
Export (CSV)#
Gene Signature exports a CSV named like gene_signature_scores.csv containing:
pagescore
Important limitations:
exported rows do not include cell indices/IDs, so the file is not directly joinable back to AnnData without additional context.
If you need per-cell mapping:
compute the signature in Python and attach it to
adata.obs, orexport a per-cell table through your own workflow.
Edge cases and pitfalls#
Most genes not found: score becomes meaningless (based on a tiny subset).
First gene missing: some pages may show “No data available” behavior; put a known-present gene early in the list.
Duplicate genes in the list: duplicates effectively up-weight that gene (it is added multiple times).
Huge signatures (hundreds of genes): can be slow and may stress browser memory.
Housekeeping-dominated signatures: scores track library size/QC rather than biology.
Troubleshooting (Gene Signature)#
Symptom: “Everything is empty / no valid values”#
Likely causes:
gene expression is not available in this dataset/loading method,
gene keys don’t match (symbols vs Ensembl, case mismatch),
signature input formatting is wrong (newline-only or extra punctuation).
How to confirm:
try a single known gene (e.g.,
MS4A1) as a “signature” and see if it produces non-empty output.
Fix:
correct gene identifiers to match your dataset,
ensure comma-separated input,
load the dataset with gene expression (see Data Loading in the Web App (All Paths)).
Screenshot placeholder (you will replace later)#
Gene Signature mode scores a gene list and shows a per-cell signature value you can compare across groups.#
Next steps#
Analysis mode: Marker Genes (Genes Panel) (marker discovery across categorical groups)
Exporting analysis results (export signature results)
Troubleshooting (analysis) (missing genes, missing expression)