[DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements by ddl-subir-m · Pull Request #15 · dominodatalab/AutoML_Extension

ddl-subir-m · 2026-03-20T14:46:31Z

Why

EDA results in DB: EDA profiling results are currently stored as JSON files on the local filesystem. This has three problems: (1) results don't persist across App restarts since the container filesystem is ephemeral, (2) results can't be scoped by owner for RBAC enforcement, and (3) there's no way to clean up stale results. Moving to DB-backed storage solves all three.

diagnostics_data column: Model diagnostics (feature importance, learning curves, confusion matrix) are currently computed on-demand when the user views the diagnostics tab. For cross-project jobs where the model files live on a different mount, on-demand computation requires downloading the model first. Pre-computing during training and storing in the DB avoids this latency and makes diagnostics available even if the model mount isn't accessible.

summary_only optimization: The job listing endpoint loads full JobResponse objects including leaderboard arrays and diagnostics blobs, even though the dashboard only shows name/status/score. The summary_only flag uses SQLAlchemy's load_only() to skip heavy columns, reducing list query time.

Owner filtering: get_registered_models() and get_jobs_for_cleanup() currently return all records regardless of who owns them. Adding owner filtering is a prerequisite for the RBAC enforcement PR.

Summary

Adds EDAResult DB model (status, mode, owner, project_id, result_payload, error)
Adds diagnostics_data JSON column to Job model
Adds summary_only query optimization with load_only() for job listing
Adds owner filtering on get_registered_models() and get_jobs_for_cleanup()
Adds EDA CRUD operations: create/get/update/write result/error/delete stale
Rewrites eda_job_store.py from file-backed to async DB-backed
Adds WebSocket broadcast helpers for job/log updates in the CRUD layer

Files changed

app/db/models.py — new EDAResult model, diagnostics_data on Job
app/db/database.py — migrations for new columns/tables
app/db/crud.py — EDA CRUD ops, summary_only, owner filtering, WebSocket broadcasts
app/core/eda_job_store.py — rewritten: file → DB-backed (async)
tests/test_eda_crud.py — EDA CRUD tests

Test plan

test_eda_crud.py passes
Existing job service tests still pass
EDA results persist across App restarts (DB-backed, not file-backed)
summary_only=True skips loading leaderboard and diagnostics columns

…ancements - Add EDAResult model for DB-backed EDA storage (replaces file-based store) - Add diagnostics_data JSON column to Job model - Add summary_only query optimization with load_only() for job listing - Add owner filtering on get_registered_models and get_jobs_for_cleanup - Add EDA CRUD operations (create/get/update/write result/error/delete stale) - Rewrite eda_job_store from file-backed to DB-backed (async) - Add WebSocket broadcast helpers for job/log updates in CRUD layer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ddl-ryan-connor · 2026-03-24T17:42:14Z

automl-service/app/db/crud.py

+logger = logging.getLogger(__name__)
+
+
+def _job_update_payload(job: Job, event_type: str = "job_update") -> dict:


does anything in this PR require a Domino Job to have the connection to the sqlite db?

No, the Domino Jobs don't connect to it directly. The App server writes to sqlite before launching a job and after polling the job status. The job itself just runs the computation and the App picks up the results.

ddl-subir-m requested review from a team and niole March 20, 2026 14:46

This was referenced Mar 20, 2026

[DOM-75520] fix: enforce owner-based RBAC on all job endpoints #20

Open

[DOM-75523] feat: cross-project workers, pre-computed diagnostics, trainer improvements #22

Closed

ddl-subir-m changed the title ~~feat: EDA results DB model + diagnostics_data + CRUD enhancements~~ [DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements Mar 23, 2026

ddl-subir-m added 2 commits March 24, 2026 11:27

Merge remote-tracking branch 'origin/main' into subir/pr4-db-eda-crud

537cace

Merge remote-tracking branch 'origin/main' into subir/pr4-db-eda-crud

8302758

ddl-ryan-connor reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements#15

[DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements#15
ddl-subir-m wants to merge 3 commits intomainfrom
subir/pr4-db-eda-crud

ddl-subir-m commented Mar 20, 2026 •

edited

Loading

Uh oh!

ddl-ryan-connor Mar 24, 2026

Uh oh!

ddl-subir-m Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		logger = logging.getLogger(__name__)


		def _job_update_payload(job: Job, event_type: str = "job_update") -> dict:

Conversation

ddl-subir-m commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Files changed

Test plan

Uh oh!

ddl-ryan-connor Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ddl-subir-m Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ddl-subir-m commented Mar 20, 2026 •

edited

Loading