Skip to content

[DOM-75569] feat: frontend overhaul - EDA, jobs, diagnostics & export UI#29

Open
ddl-subir-m wants to merge 242 commits intomainfrom
subir/pr10b-frontend
Open

[DOM-75569] feat: frontend overhaul - EDA, jobs, diagnostics & export UI#29
ddl-subir-m wants to merge 242 commits intomainfrom
subir/pr10b-frontend

Conversation

@ddl-subir-m
Copy link
Collaborator

@ddl-subir-m ddl-subir-m commented Mar 24, 2026

Summary

  • EDA Analysis page with tabular/time series mode toggle and manual "Analyze" trigger
  • Job detail page with tab navigation (results, leaderboard, diagnostics)
  • Live job updates via useJobLiveUpdates hook, replacing simulated progress bars
  • Dataset hooks and data source selector for Domino datasets
  • Interactive leaderboard with normalized timing keys for time series models
  • Export dialog with Docker build command display
  • Time series config panel for column selection and prediction length
  • Debug logger utility
  • Removed deprecated components (Table, DataOverview, DataTable, InteractiveCharts, TimeSeriesForecastPanel, useJobProgress, pathDefaults)

Dependencies

Depends on PR #9b (subir/pr9b-workers-trainers) and all upstream PRs

Test plan

  • EDA page loads and toggles between tabular/time series modes
  • "Analyze" button triggers profiling and displays results
  • Job detail page shows correct tabs and live status updates
  • Dataset selector lists Domino datasets and handles uploads
  • Leaderboard renders with correct timing columns for both model types
  • Export dialog shows build command and downloads zip
  • No console errors from removed component imports

ddl-subir-m and others added 30 commits March 10, 2026 12:54
…aunch diagnostics

- Replace filesystem-only dataset listing with Domino Dataset RW v2 API
  (GET /api/datasetrw/v2/datasets?projectIdsToInclude=...) when a project
  ID is available via X-Project-Id header or DOMINO_PROJECT_ID env var
- Cross-reference API results with mounted filesystem paths for file
  discovery (preview/training still uses local mounts)
- Fall back to legacy filesystem scan when API is unavailable
- Add params support to domino_request() helper
- Thread project ID through dataset routes and compat endpoints
- Add diagnostic logging to job launch path to trace project ID flow
- Add scripts/diagnose_api_routing.py for testing proxy vs direct host
…nce at construction

The ApiClient singleton was reading ?projectId= from the URL once at
module load time. React Router's Navigate redirect strips query params
before the header could be reliably captured. Now the project ID is:
1. Cached eagerly at module evaluation time
2. Resolved dynamically on every API request
3. Synced from React Router search params via ProjectIdSync component
…query params

Domino's app proxy strips query parameters before serving the App,
so ?projectId=TARGET_ID never reaches the frontend JS. Hash fragments
(#projectId=TARGET_ID) are client-side only and survive proxy stripping.

The frontend now checks both ?projectId= and #projectId= at module load
time, in the per-request header injection, and in the React Router sync.
Domino loads Apps in an iframe with a clean internal URL — the user's
projectId (query param or hash) is on the parent frame, not the iframe.
Since both are same-origin, read window.parent.location as fallback.
…austion

- Detect zombie local jobs (RUNNING in DB but no active asyncio task) on
  every job-list request and auto-mark them FAILED
- Wrap Domino job sync in try/except so a flaky API call cannot crash the
  job-list endpoint
- Mark Domino jobs stuck in RUNNING >1hr as FAILED on startup
Test scripts to discover and validate Domino Dataset RW v2 API capabilities:
- test_dataset_api.py: list, create, verify datasets and probe mount paths
- test_dataset_grant.py: test cross-project dataset sharing/grant workflows
- test_dataset_upload.py: probe snapshot, direct, presigned, and mount-write upload methods

ProjectStorageResolver service for auto-creating per-project automl-extension
datasets with in-memory caching and mount path probing.
- Use v1 endpoint for dataset creation (v2 POST returns 404)
- Use "name" field instead of "datasetName" for v1 create payload
- Unwrap v2 list response nested {"dataset": {...}} wrappers
- Add grant API call (POST v1 grants with DatasetRwEditor role)
- Use v1 for get-by-id (v2 returns 404)
- Update test script with correct endpoints and payload shapes
- Implement upload_file() in storage_resolver using Domino's v4
  chunked upload API (same workflow as python-domino SDK)
- Support files of any size via automatic chunking (default 8MB)
- Per-chunk retry with exponential backoff (up to 10 attempts)
- Auto-cancel upload session on failure
- Add files and headers support to domino_request()
- Update test_dataset_upload.py with working v4 upload probe
- Fix v2 response unwrapping and use v1 for get-by-id in test script
…-local

- Storage resolver wired into uploads, training, and health endpoints
- Prediction service probes dataset mounts to find models
- EDA job store resolves base_dir from dataset mount when available
- Orphan cleanup scans dataset mount paths in addition to app-local dirs
- Add dataset file download endpoint and chunked upload support
Route all temp file creation (leaderboard, feature importance, MLflow
model staging, model exports) through project dataset mounts so artifacts
land on persistent storage. Also adds mount timing test script that
confirmed dataset mounts require an app restart to appear.
Dataset mounts are resolved at boot time, so creating the dataset
from inside a running job means the mount is never available. Move
dataset creation to before job launch so the Domino Job boots with
the mount already present. Also fix EDA async profiling to pass
project_id from X-Project-Id header and pre-create the dataset.
The upload endpoint previously called resolve_project_paths() which
raises 503 if the dataset mount is not yet available. On first-time
upload the mount won't exist until the app restarts. Now uses
check_project_storage() with graceful fallback to settings.uploads_path,
and pre-creates the dataset for future Jobs/restarts.
…l mount

When a target project_id is present, read the uploaded file into memory,
extract metadata from the buffer, and upload to the automl-extension
dataset via the v4 chunked API. This removes the dependency on having the
dataset mount available in the app container, making cross-project uploads
reliable. Falls back to local disk for standalone/no-project mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the app runs outside the target project, dataset mount paths don't
exist locally. This adds transparent download-on-demand: ensure_local_file()
checks if a dataset mount path exists, and if not, downloads the file via
the Domino Dataset RW API to a local cache before profiling or training.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add /export/deployment/zip that streams a zip directly from dataset mount
without intermediate copies. Simplify the UI to a single download button
(no output dir input needed). Make output_dir optional on the existing
build endpoint. Remove unused diagnostic and test scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add pytest suites covering:
- domino_download() streaming, domino_request() retries, auth headers
- StorageResolver: download_file(), _find_existing(), snapshots, grants, cache
- ensure_local_file(): local/remap/download branches, caching, fallbacks
- /export/deployment/zip: zip helpers and API endpoint integration

53 tests, all passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standalone script that tests the full dataset lifecycle against a live
Domino environment: auth, create, grant, upload, snapshots, list files,
download (probes endpoints), and cleanup. Confirms file download via API
is not supported (404 on all endpoints) — files only accessible via mounts.

Run: python scripts/test_domino_api_live.py [--keep] [--project-id ID]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Systematically probes 60+ endpoint patterns across v1/v2/v4 and alternative
paths. All return 404 — Domino Dataset RW API has no file read/download
capability. Files are only accessible via mount paths in Domino Jobs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… API

Domino Dataset RW API has no file read/download endpoints (confirmed by
probing 60+ URL patterns). Files uploaded to the dataset are only accessible
via mount paths in Domino Jobs, not in the app.

Fix: during upload, also write the file bytes to the local dataset_cache
directory using the same cache key scheme that ensure_local_file() uses.
When profiling or local training calls ensure_local_file(), it finds the
cached copy immediately.

Also make ensure_local_file() gracefully handle download failures instead
of propagating the exception, returning the original path for a clearer
downstream error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete cached dataset files older than 24 hours during app startup to
prevent indefinite accumulation on disk. Profiling and training happen
within minutes of upload, so 24h gives ample margin.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ore jobs

After uploading a file to a Domino Dataset via the chunked API, poll the
snapshot endpoint to confirm the file is committed before allowing the user
to proceed to profiling or training. This prevents Domino Jobs from failing
because the file hasn't materialized in the mount yet.

Backend: inline polling (~15s) during upload, plus a GET /verify-snapshot
endpoint for frontend fallback polling. Frontend: useSnapshotVerification
hook gates setDataSource/setSelectedFilePath behind verification, with
"Proceed Anyway" escape hatch on timeout.

Fixes: route ordering (verify-snapshot above /{dataset_id} catch-all),
setTimeout chain instead of setInterval to prevent overlapping async polls,
snapshot_file_path for consistent path matching, reduced inline backoff to
avoid proxy timeouts, and Proceed Anyway escape hatch in EDA page.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nds automl.db

When a training/EDA job runs in the target project, DOMINO_PROJECT_NAME
points to that project, causing config.py to derive the wrong DB path.
Pass the app's database_url as a --database-url CLI arg to the runners,
which set it as an env var before any app imports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers arg parsing, env var injection, command building, and config
preservation to verify the cross-project DATABASE_URL fix (ae7f6e2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l DB read

When launching a Domino training job, serialize the complete job config
to JSON and pass it as a CLI arg so the runner can skip the cross-project
DB read on startup. Falls back to DB read when --job-config is absent
(local execution path). DB is still used for writes (progress, status,
results, logs, cancellation checks).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Child jobs receive a sqlite database URL from the app project, but the
target project may mount the shared dataset at a different path depending
on whether it's DFS or git-based. Add project type detection via
filesystem probes and remap the database URL to the correct mount path
before setting DATABASE_URL in the environment.

Also adds /mnt/imported/data/ to _MOUNT_ROOTS in utils.py so
remap_shared_path() works for git-based target projects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…unch flows

Tests the full API request path with mocked Domino APIs to verify:
- Training job launch command includes --database-url and --job-config
- Async EDA launch command includes --database-url
- Runner DB URL remap resolves across git and DFS mount points
- remap_shared_path covers /mnt/imported/data/ mount root

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add AUTOML_DEBUG_LOGGING (backend) and VITE_DEBUG_LOGGING (frontend) env
vars for verbose request/response logging during Domino app debugging.
Both default to false for production.

Backend: new DebugLoggingMiddleware logs method, URL, headers, body, and
timing for every request when enabled. Frontend: new debug logger utility
wraps API client calls with grouped console output including timing.

Update both READMEs to reflect current codebase structure: new compat/
directory, core services, serving layer, resolvers, scripts, and all
missing env vars. Remove dead test_dataset_grant.py script.
…t logic

Replace file-listing-based snapshot verification with snapshot status
check. The Domino Dataset RW API has no file-listing endpoint, so all
list_files() calls returned 404. Now checks if the latest snapshot
status is "active" which correctly indicates the upload commit completed.

Remove _grant_project_access() — the grants API takes user IDs not
project IDs, and cross-project grants aren't needed since local training
uses the cached file and Domino Jobs run in the target project.
ddl-subir-m and others added 10 commits March 23, 2026 23:58
…ancements

- Add EDAResult model for DB-backed EDA storage (replaces file-based store)
- Add diagnostics_data JSON column to Job model
- Add summary_only query optimization with load_only() for job listing
- Add owner filtering on get_registered_models and get_jobs_for_cleanup
- Add EDA CRUD operations (create/get/update/write result/error/delete stale)
- Rewrite eda_job_store from file-backed to DB-backed (async)
- Add WebSocket broadcast helpers for job/log updates in CRUD layer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security fix: all job endpoints now enforce ownership.

- Add _enforce_job_owner() — returns 404 to non-owners (avoids leaking job existence)
- Invert get_request_owner() priority: prefer domino-username header over sidecar
  (sidecar returns App owner, not viewing user)
- resolve_job_list_filters() ignores client-supplied owner (always server-side)
- Add needs_request flag in compat patterns for Request forwarding
- Owner filtering on cleanup, registered models
- Add JobListItemResponse lightweight schema for list views
- Add clear_viewing_user() to prevent cross-request context leakage
- Background Domino sync throttling, zombie local job detection
- Add leaderboard_utils for normalize_leaderboard_payload

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… subir/replace-domino-sdk-with-api

# Conflicts:
#	automl-service/app/api/compat/custom_datasets.py
#	automl-service/app/api/routes/datasets.py
#	automl-service/app/api/utils.py
#	automl-service/app/core/dataset_manager.py
#	automl-service/app/core/utils.py
#	automl-service/app/services/domino_dataset_api.py
#	automl-service/app/services/storage_resolver.py
#	automl-service/tests/test_api_utils.py
#	automl-service/tests/test_dataset_manager.py
#	automl-service/tests/test_domino_dataset_api.py
#	automl-service/tests/test_storage_resolver.py
#	automl-service/tests/test_storage_resolver_extras.py
…/replace-domino-sdk-with-api

# Conflicts:
#	automl-service/app/api/routes/profiling.py
#	automl-service/app/core/domino_job_launcher.py
#	automl-service/app/main.py
#	automl-service/tests/test_domino_job_launcher.py
- Training worker with data-path resolution, config deserialization,
  cancellation checks, and step-by-step progress tracking
- Job queue manager with concurrency limiting, recovery, and graceful shutdown
- Prediction service with pre-computed diagnostics and time series support
- Model export with Docker zip packaging
- Leaderboard normalization for time series timing keys
- Cross-project DB URL remap and dataset file resolution
- 11 new test modules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EDA Analysis page with tabular/time series mode toggle and manual trigger
- Job detail page with tab navigation (results/leaderboard/diagnostics)
- Dataset hooks and data source selector for Domino datasets
- Live job updates replacing simulated progress bars
- Interactive leaderboard with normalized timing keys
- Export dialog with Docker build command display
- Time series config panel for column selection
- Debug logger utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Table.tsx, DataOverview.tsx, DataTable.tsx, InteractiveCharts.tsx (replaced by inline SVG charts)
- TimeSeriesForecastPanel.tsx (consolidated into EDA flow)
- useJobProgress.ts (replaced by useJobLiveUpdates)
- pathDefaults.ts, eda/index.ts barrel export (no longer needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ddl-subir-m ddl-subir-m requested review from a team and niole March 24, 2026 14:55
@ddl-subir-m ddl-subir-m changed the title feat: frontend overhaul — EDA, jobs, diagnostics & export UI feat: frontend overhaul - EDA, jobs, diagnostics & export UI Mar 24, 2026
@ddl-subir-m ddl-subir-m changed the title feat: frontend overhaul - EDA, jobs, diagnostics & export UI [DOM-75569] feat: frontend overhaul - EDA, jobs, diagnostics & export UI Mar 24, 2026
ddl-subir-m and others added 16 commits March 24, 2026 11:26
…sdk-with-api

# Conflicts:
#	automl-service/app/api/routes/health.py
#	automl-service/app/api/routes/jobs.py
#	automl-service/app/main.py
…ed client

Per Niole's review:
- Remove _remap_db_url_for_target (eliminating imported data pattern)
- Remove multi-base-URL fallback, use DOMINO_API_HOST directly
- Replace raw domino_request() calls with generated public API client
  for job start (start_job) and status (get_job_details)
- Keep domino_request only for v4 stop (no public API alternative)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	automl-service/app/api/routes/jobs.py
#	automl-service/app/main.py
# Conflicts:
#	automl-service/app/api/routes/health.py
#	automl-service/app/core/domino_job_launcher.py
…h-api

# Conflicts:
#	automl-service/app/api/routes/jobs.py
#	automl-service/app/core/domino_job_launcher.py
…ainers

# Conflicts:
#	automl-service/app/api/utils.py
#	automl-service/tests/test_api_utils.py
The comment claimed the sidecar token is used as fallback, but the code
never falls back to the sidecar. Clarify that missing user tokens cause
MissingUserTokenError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…h-api

# Conflicts:
#	automl-service/app/api/utils.py
Resolve merge conflicts with Niole's PR #30 (job auth via authorized
actions).  Adopt get_viewing_user_name(), require_job_list(), and the
dual-track get_job_or_404 (local owner check / Domino API check) while
keeping our unique additions: zombie local-job detection, summary_only
query optimization, build_job_list_item_response, and cleanup owner
scoping.  Remove get_request_owner / _enforce_job_owner in favor of
the PR #30 patterns.  Add from __future__ import annotations to
job_service.py for Python 3.9 compat with | union syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Job service functions (get_job_response, cancel_job, delete_job, etc.)
accept (db, job_id) only. The compat route registrations incorrectly had
needs_request=True, causing patterns.py to pass request=request which
triggered TypeError on /svcjobget, /svcjobcancel, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant