[DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements#15
Open
ddl-subir-m wants to merge 3 commits intomainfrom
Open
[DOM-75517] feat: EDA results DB model + diagnostics_data + CRUD enhancements#15ddl-subir-m wants to merge 3 commits intomainfrom
ddl-subir-m wants to merge 3 commits intomainfrom
Conversation
…ancements - Add EDAResult model for DB-backed EDA storage (replaces file-based store) - Add diagnostics_data JSON column to Job model - Add summary_only query optimization with load_only() for job listing - Add owner filtering on get_registered_models and get_jobs_for_cleanup - Add EDA CRUD operations (create/get/update/write result/error/delete stale) - Rewrite eda_job_store from file-backed to DB-backed (async) - Add WebSocket broadcast helpers for job/log updates in CRUD layer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 20, 2026
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _job_update_payload(job: Job, event_type: str = "job_update") -> dict: |
There was a problem hiding this comment.
does anything in this PR require a Domino Job to have the connection to the sqlite db?
Collaborator
Author
There was a problem hiding this comment.
No, the Domino Jobs don't connect to it directly. The App server writes to sqlite before launching a job and after polling the job status. The job itself just runs the computation and the App picks up the results.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
EDA results in DB: EDA profiling results are currently stored as JSON files on the local filesystem. This has three problems: (1) results don't persist across App restarts since the container filesystem is ephemeral, (2) results can't be scoped by owner for RBAC enforcement, and (3) there's no way to clean up stale results. Moving to DB-backed storage solves all three.
diagnostics_data column: Model diagnostics (feature importance, learning curves, confusion matrix) are currently computed on-demand when the user views the diagnostics tab. For cross-project jobs where the model files live on a different mount, on-demand computation requires downloading the model first. Pre-computing during training and storing in the DB avoids this latency and makes diagnostics available even if the model mount isn't accessible.
summary_only optimization: The job listing endpoint loads full
JobResponseobjects including leaderboard arrays and diagnostics blobs, even though the dashboard only shows name/status/score. Thesummary_onlyflag uses SQLAlchemy'sload_only()to skip heavy columns, reducing list query time.Owner filtering:
get_registered_models()andget_jobs_for_cleanup()currently return all records regardless of who owns them. Adding owner filtering is a prerequisite for the RBAC enforcement PR.Summary
EDAResultDB model (status, mode, owner, project_id, result_payload, error)diagnostics_dataJSON column toJobmodelsummary_onlyquery optimization withload_only()for job listingget_registered_models()andget_jobs_for_cleanup()eda_job_store.pyfrom file-backed to async DB-backedFiles changed
app/db/models.py— newEDAResultmodel,diagnostics_dataon Jobapp/db/database.py— migrations for new columns/tablesapp/db/crud.py— EDA CRUD ops,summary_only, owner filtering, WebSocket broadcastsapp/core/eda_job_store.py— rewritten: file → DB-backed (async)tests/test_eda_crud.py— EDA CRUD testsTest plan
test_eda_crud.pypassessummary_only=Trueskips loading leaderboard and diagnostics columns