Codebase Architecture#

cellucid-r is intentionally small. Most of the implementation lives in one file.

Repository layout (high level)#

cellucid-r/R/cellucid_prepare.R
- exports cellucid_prepare() and prepare()
- contains the exporter implementation and helper functions
cellucid-r/man/cellucid_prepare.Rd
- R help page generated/maintained for Bioconductor-style docs
cellucid-r/tests/testthat/
- unit tests validating core files, normalization, quantization, connectivity, vector fields
cellucid-r/vignettes/cellucid.Rmd
- minimal vignette showing a small export workflow
cellucid-r/PUBLISHING.md
- release/publishing checklist

At a high level:

Validate embeddings and infer n_cells.
Normalize embeddings (center + scale) and write points_*d.bin.
Validate/convert latent_space and obs.
Export optional vector fields (scaled with embedding normalization).
Export obs:
- continuous values (float32 or quantized)
- categorical codes (uint8/uint16) + outlier quantiles (latent-space)
- centroids (embedding-space)
- write obs_manifest.json
Export gene expression (optional):
- validate var and gene_expression
- write one dense vector per gene under var/
- write var_manifest.json
Export connectivities (optional):
- symmetrize and binarize
- write edge pairs under connectivity/
- write connectivity_manifest.json
Write dataset_identity.json (summary + pointers to files).

The user-guide docs mirror this structure:

If you add a new feature that writes files:

Decide where it belongs:
- dataset_identity.json (top-level discovery)
- a dedicated manifest JSON
- a new subdirectory of binaries
Add tests under cellucid-r/tests/testthat/.
Update the user guide format spec:
- Export Directory Format (Specification)