Skip to content

[DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements#15

Open
ddl-subir-m wants to merge 3 commits intomainfrom
subir/pr4-db-eda-crud
Open

[DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements#15
ddl-subir-m wants to merge 3 commits intomainfrom
subir/pr4-db-eda-crud

Conversation

@ddl-subir-m
Copy link
Copy Markdown
Collaborator

@ddl-subir-m ddl-subir-m commented Mar 20, 2026

Why

EDA results in DB: EDA profiling results are currently stored as JSON files on the local filesystem. This has three problems: (1) results don't persist across App restarts since the container filesystem is ephemeral, (2) results can't be scoped by owner for RBAC enforcement, and (3) there's no way to clean up stale results. Moving to DB-backed storage solves all three.

diagnostics_data column: Model diagnostics (feature importance, learning curves, confusion matrix) are currently computed on-demand when the user views the diagnostics tab. For cross-project jobs where the model files live on a different mount, on-demand computation requires downloading the model first. Pre-computing during training and storing in the DB avoids this latency and makes diagnostics available even if the model mount isn't accessible.

summary_only optimization: The job listing endpoint loads full JobResponse objects including leaderboard arrays and diagnostics blobs, even though the dashboard only shows name/status/score. The summary_only flag uses SQLAlchemy's load_only() to skip heavy columns, reducing list query time.

Owner filtering: get_registered_models() and get_jobs_for_cleanup() currently return all records regardless of who owns them. Adding owner filtering is a prerequisite for the RBAC enforcement PR.

Summary

  • Adds EDAResult DB model (status, mode, owner, project_id, result_payload, error)
  • Adds diagnostics_data JSON column to Job model
  • Adds summary_only query optimization with load_only() for job listing
  • Adds owner filtering on get_registered_models() and get_jobs_for_cleanup()
  • Adds EDA CRUD operations: create/get/update/write result/error/delete stale
  • Rewrites eda_job_store.py from file-backed to async DB-backed
  • Adds WebSocket broadcast helpers for job/log updates in the CRUD layer

Files changed

  • app/db/models.py — new EDAResult model, diagnostics_data on Job
  • app/db/database.py — migrations for new columns/tables
  • app/db/crud.py — EDA CRUD ops, summary_only, owner filtering, WebSocket broadcasts
  • app/core/eda_job_store.py — rewritten: file → DB-backed (async)
  • tests/test_eda_crud.py — EDA CRUD tests

Test plan

  • test_eda_crud.py passes
  • Existing job service tests still pass
  • EDA results persist across App restarts (DB-backed, not file-backed)
  • summary_only=True skips loading leaderboard and diagnostics columns

…ancements

- Add EDAResult model for DB-backed EDA storage (replaces file-based store)
- Add diagnostics_data JSON column to Job model
- Add summary_only query optimization with load_only() for job listing
- Add owner filtering on get_registered_models and get_jobs_for_cleanup
- Add EDA CRUD operations (create/get/update/write result/error/delete stale)
- Rewrite eda_job_store from file-backed to DB-backed (async)
- Add WebSocket broadcast helpers for job/log updates in CRUD layer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ddl-subir-m ddl-subir-m requested review from a team and niole March 20, 2026 14:46
@ddl-subir-m ddl-subir-m changed the title feat: EDA results DB model + diagnostics_data + CRUD enhancements [DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements Mar 23, 2026
logger = logging.getLogger(__name__)


def _job_update_payload(job: Job, event_type: str = "job_update") -> dict:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does anything in this PR require a Domino Job to have the connection to the sqlite db?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the Domino Jobs don't connect to it directly. The App server writes to sqlite before launching a job and after polling the job status. The job itself just runs the computation and the App picks up the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants