Skip to content

[DOM-75519] feat: migrate job launcher from v4 to v1/beta Domino Jobs API#19

Open
ddl-subir-m wants to merge 17 commits intosubir/pr1-http-layerfrom
subir/pr6-job-launcher
Open

[DOM-75519] feat: migrate job launcher from v4 to v1/beta Domino Jobs API#19
ddl-subir-m wants to merge 17 commits intosubir/pr1-http-layerfrom
subir/pr6-job-launcher

Conversation

@ddl-subir-m
Copy link
Copy Markdown
Collaborator

@ddl-subir-m ddl-subir-m commented Mar 20, 2026

Why

The job launcher currently uses Domino's v4 Jobs API (/v4/jobs/start, /v4/jobs/{id}), which is internal and undocumented. This creates risk — internal APIs can change without notice.

The public v1 API (/api/jobs/v1/jobs) is documented, stable, and has two advantages:

  1. It accepts hardware tier by name instead of ID, eliminating the extra API call to resolve tier name → ID
  2. The request format is simpler (runCommand instead of commandToRun)

Additionally, cross-project training jobs need the database URL remapped because the SQLite path written by the App (/domino/datasets/local/automl-extension/automl.db) doesn't exist in the target project's mount layout. The launcher now passes --database-url and --job-config as CLI args to workers.

Depends on

Summary

  • POST /v4/jobs/startPOST /api/jobs/v1/jobs
  • GET /v4/jobs/{id}GET /api/jobs/beta/jobs/{id}
  • Removes _resolve_hardware_tier_id() (no longer needed)
  • Adds _job_api_request() with nucleus-first, proxy-fallback routing
  • Adds _remap_db_url_for_target() for cross-project SQLite paths
  • Passes database_url and job_config CLI args to training/EDA workers

Files changed

  • app/core/domino_job_launcher.py — rewritten job launch and status APIs
  • tests/test_domino_job_launcher.py — new tests

Test plan

  • test_domino_job_launcher.py passes
  • Job launch uses v1 API endpoint
  • Job status parsed from v1/beta response envelope
  • Hardware tier passed by name (no ID resolution call)
  • Cross-project DB URL correctly remapped

ddl-subir-m and others added 2 commits March 20, 2026 09:44
- Add DominoProjectType enum (DFS/GIT/UNKNOWN) with filesystem-based detection
- Add _db_url_remap for cross-project SQLite URL remapping across mount types
- Add tabular_data module: centralized CSV/parquet preview, schema, row counting
  with LRU caching (replaces scattered pd.read_csv/parquet calls)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lver

- Add normalize_leaderboard_rows/payload to fix TimeSeries fit_time display
- Add resolve_request_project_id() to centralize project context extraction
  from X-Project-Id header, query params, and DOMINO_PROJECT_ID env var

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ddl-subir-m ddl-subir-m requested a review from niole March 20, 2026 17:41
@ddl-subir-m ddl-subir-m changed the title feat: migrate job launcher from v4 to v1/beta Domino Jobs API [DOM-75519] feat: migrate job launcher from v4 to v1/beta Domino Jobs API Mar 23, 2026
ddl-subir-m and others added 5 commits March 23, 2026 22:08
…roject_id

The env var is the App's own project, not the target project the user
is working in. Falling back to it silently operates on the wrong project
(root cause of datasets showing empty in cross-project scenarios).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add DominoProjectType enum (DFS/GIT/UNKNOWN) with filesystem-based detection
- Add _db_url_remap for cross-project SQLite URL remapping across mount types
- Add tabular_data module: centralized CSV/parquet preview, schema, row counting
  with LRU caching (replaces scattered pd.read_csv/parquet calls)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch job launch from POST /v4/jobs/start to POST /api/jobs/v1/jobs
- Switch job status from GET /v4/jobs/{id} to GET /api/jobs/beta/jobs/{id}
- Remove _resolve_hardware_tier_id (v1 API accepts tier name directly)
- Add _job_api_request with direct-host-first fallback
- Add _remap_db_url_for_target for cross-project database paths
- Pass database_url and job_config as CLI args to workers
DOMINO_ENVIRONMENT_ID and DOMINO_ENVIRONMENT_REVISION_ID are set on
the App container and identify the compute environment with the right
dependencies. Using env vars eliminates per-caller plumbing and ensures
child jobs always match the App's environment.

Removes environment_id param from _job_start, start_training_job, and
start_eda_job. Adds environmentRevisionId to job payload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve the training data path at job creation time and pass it as
--file-path to the Domino Job command. The worker uses the path
directly instead of needing dataset API access at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ddl-subir-m and others added 7 commits March 24, 2026 10:41
Query params are the canonical approach going forward. The X-Project-Id
header is kept as a fallback for legacy clients only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The frontend sends both header and query param from the same source.
No scenario where header is present but query param isn't. Query param
only — simpler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
[DOM-75514] feat: core HTTP layer enhancements + debug middleware
[DOM-75515] feat: project type detection, DB URL remap, tabular data helpers
…tils

[DOM-75516] feat: leaderboard normalization utils + request project ID resolver
id_column=request.id_column,
rolling_window=request.rolling_window,
hardware_tier_name=request.domino_hardware_tier_name or settings.domino_eda_hardware_tier_name,
environment_id=request.domino_environment_id or settings.domino_eda_environment_id,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this? Will the environment variables for specifying the environment to use still work?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment_id parameter was moved to DominoJobLauncher.init (domino_job_launcher.py:36-37) where it reads DOMINO_ENVIRONMENT_ID from env vars once at construction. So it no longer needs to be passed per call from the profiling route. The env var still works it's just read in one place now instead of being threaded through every call site.


The App's DB lives at e.g. ``/mnt/data/automl_shared_db/automl.db``
(local mount). A Job running in a *different* project sees the
App's data under ``/mnt/imported/data/`` instead. Swap the prefix
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are eliminating the need for the imported data. We should not include this in the functionality here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. The _remap_db_url_for_target method and all /mnt/imported/data/ remapping logic is gone. Jobs now pass self.settings.database_url directly.

async def _job_api_request(self, method: str, path: str, **kwargs) -> httpx.Response:
"""Call a Jobs API endpoint, preferring the direct host over the proxy."""
last_exc: Optional[Exception] = None
base_urls = self._job_api_base_urls()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to figure out multiple base urls. Just send to the DOMINO_API_HOST

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified. Removed _job_api_base_urls() and the _job_api_request() fallback loop. All calls now go through DOMINO_API_HOST via the generated client.

request_kwargs.setdefault("max_retries", 0)
is_last = index == len(base_urls) - 1
try:
return await domino_request(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the generate public api client for this

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. _job_start now uses start_job.asyncio_detailed() with NewJobV1 from the generated public API client. get_job_status uses get_job_details.asyncio_detailed(). Only stop_job still uses raw domino_request since there's no public API for v4 stop.

ddl-subir-m and others added 3 commits March 24, 2026 12:05
…ed client

Per Niole's review:
- Remove _remap_db_url_for_target (eliminating imported data pattern)
- Remove multi-base-URL fallback, use DOMINO_API_HOST directly
- Replace raw domino_request() calls with generated public API client
  for job start (start_job) and status (get_job_details)
- Keep domino_request only for v4 stop (no public API alternative)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants