CryoET + AI Data Portal

A Pydantic-validated metadata schema, a directory-walking catalog scanner, a FastAPI read API, and a TanStack Start + Material UI frontend for the CryoET + AI project. The portal answers one question across both the experimental and simulation arms of the project: which conditions have we covered, and which still need cryoET imaging, simulation, or both?

Status: draft / proposed. Schema fields and conventions are still evolving as researchers start authoring metadata against it.

Repository map

Path	Contents
`src/schema/`	Authoritative Pydantic schema, JSON Schema generators, and the `validate` CLI.
`src/catalog/`	Directory-walking scanner that builds the catalog DB from `sample.toml` + `acquisition.toml` + MDOC/MRC headers. Includes the FastAPI read API under `src/catalog/api/`.
`frontend/`	React + TanStack Start + Material UI app that reads from the FastAPI server.
`deploy/`	Docker, Kubernetes/OpenShift manifests, nginx config, and the deployment guide (`deploy/DEPLOYMENT.md`).
`templates/`	Starter `sample.toml`, `acquisition.toml`, and directory skeletons, containing the TOML files in the expected locations, for new experimental (cryoET) and simulation (MD + synthetic cyroET) samples.
`docs/data_organization.md`	The on-disk layout and TOML metadata authoring guide for researchers.
`docs/architecture.md`	System architecture overview.
`.claude/plans/`	Implementation plans, including the catalog scanner plan.
`pyproject.toml` / `pixi.lock`	PyPI dependencies (`[project]`), and pixi config (`[tool.pixi.*]`).

For the schema itself, see docs/schema.md (human reference) and src/schema/schema.py (Pydantic source of truth).

Development

Note: This setup guide assumes you are working on machine with access to the Janelia file system.

Install pixi.
From the repo root, run pixi install to materialize the Python environments.

The frontend's Node deps are installed automatically the first time you run pixi run frontend (and re-run only when package.json / package-lock.json change). You don't need a separate npm install step.

Create the database. Pass the path to the data root via the CATALOG_DATA_ROOT env variable. This will scan the samples at that path and create a SQLite database called catalog.db in your repo root.

To also pre-generate tomogram thumbnails, set CATALOG_THUMBNAIL_DIR to a writable directory (or pass --thumbnail-dir). A plain rescan auto-heals a wiped cache; --force fully rebuilds it.

CATALOG_DATA_ROOT=/path/to/data CATALOG_THUMBNAIL_DIR=/path/to/thumbnails pixi run scan --init

The portal has two processes: the FastAPI backend (reads the catalog DB) and the TanStack Start frontend (server-renders + hydrates a React app, proxying /api to FastAPI). Run them in two terminals.

Terminal 1 — API:

CATALOG_DATA_ROOT=/path/to/data CATALOG_THUMBNAIL_DIR=/path/to/thumbnails pixi run api

Serves http://localhost:8000. Swagger UI at /docs.

No hot-reload. The API runs with --no-reload (single worker). Neuroglancer's in-process HTTP server is incompatible with uvicorn's --reload mode, which tries to bind a second HTTP server on the same port.

Terminal 2 — Frontend:

pixi run frontend

Open the data portal at http://localhost:3000.

Alternate port

If port 8000 is taken, pass uvicorn flags through to use an alternate port. You can also change the IP binding:

pixi run api --host 0.0.0.0 --port 8034

If you change the backend host/port, you will also need to point the frontend to it. The frontend reads its dev-server settings from frontend/.env.local (gitignored). Create it like this:

# Backend the /api proxy points to (default: http://localhost:8000)
API_PROXY_TARGET=http://localhost:8034

# Port the Vite dev server listens on (default: 3000)
FRONTEND_PORT=3030

Neuroglancer configuration

The "View in Neuroglancer" feature starts an in-process HTTP server inside the API process. It listens on its own port, separate from the API port, and is not behind nginx — the browser connects to it directly.

Environment variable	Default	Description
`NEUROGLANCER_BIND_ADDRESS`	`0.0.0.0`	IP address the Neuroglancer server binds to.
`NEUROGLANCER_PORT`	`8050`	Port the Neuroglancer server listens on. Must be published separately from the API port; the browser connects to this port directly (not through nginx).
`NEUROGLANCER_MAX_VIEWERS`	`8`	Maximum number of concurrent viewers held in the LRU registry.
`DASHBOARD_HOSTNAME`	(unset)	Overrides the hostname in Neuroglancer viewer URLs. Use in deployments where the server-side host differs from what the browser sees.

The API must run as a single worker with --no-reload because the Neuroglancer server is process-global. Running multiple workers or hot-reloading would attempt to bind a second HTTP server on the same port.

Production deployment

For Kubernetes deployment, see deploy/DEPLOYMENT.md.

Testing Docker deployment locally

This models the production deployment using local Docker services. Nginx is the only port exposed to the host and proxies /api/* to FastAPI and everything else to the frontend SSR server.

Prerequisites: Docker and Docker Compose installed.

Create a .env file in the repo root:

CATALOG_DATA_ROOT=/path/to/data
NGINX_PORT=80            # optional, defaults to 80

Build all images:

docker compose build

Run the scanner to populate the database (writes into the catalog-db Docker volume):

docker compose --profile scan run --rm scanner

--profile scan activates the scanner service, which is excluded from the default docker compose up because in production it will run as a Kubernetes CronJob. run --rm starts it as a one-shot container and removes it when it exits.

Start the stack:

docker compose up

Open http://localhost (or http://localhost:<NGINX_PORT> if you changed the port). The API and frontend ports (8000 and 3000) are internal to the Docker network and not accessible from the host.

Resetting after schema changes

The SQLite database persists in the catalog-db named volume across restarts. If the ORM schema has changed since the DB was created (new columns, renamed enums), the API will return 500 errors. Fix by wiping the volume and rescanning:

docker compose down -v
docker compose --profile scan run --rm scanner
docker compose up

Maintaining the schema

src/schema/schema.py is the single source of truth for field definitions. When changing a field:

Edit schema.py and the canonical template(s) — templates/sample.toml / templates/acquisition.toml.
Run pixi run sync to regenerate the derived artifacts: schema.json, acquisition.schema.json, and the templates/sample_id_{data_type}/ starter copies.
Update docs/schema.md, the human-readable schema documentation for every stored field, including DB-only ones not in any TOML.
Run pixi run -e api test -- tests/test_repo_consistency.py tests/test_generate_json_schema.py. The drift guards in tests/test_repo_consistency.py and tests/test_generate_json_schema.py fail with a fix hint if the generated schemas, starter copies, or docs are out of date.

Schema authoring & validation

For researchers writing sample.toml / acquisition.toml, the authoring guide is in docs/data_organization.md. Quick commands:

Command	What it does
`pixi run validate {sample_dir}`	Validate `sample.toml` and all `acquisition.toml` files under a sample directory.
`pixi run json-schema`	Regenerate `src/schema/schema.json` and `acquisition.schema.json` from the Pydantic models. Run after any change to `schema.py`.
`pixi run test`	Run the test suite.

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.claude/plans		.claude/plans
.devcontainer		.devcontainer
.github/workflows		.github/workflows
deploy		deploy
docs		docs
frontend		frontend
src		src
templates		templates
tests		tests
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CryoET + AI Data Portal

Repository map

Development

Alternate port

Neuroglancer configuration

Production deployment

Testing Docker deployment locally

Resetting after schema changes

Maintaining the schema

Schema authoring & validation

Further reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CryoET + AI Data Portal

Repository map

Development

Alternate port

Neuroglancer configuration

Production deployment

Testing Docker deployment locally

Resetting after schema changes

Maintaining the schema

Schema authoring & validation

Further reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages