skills: split into databricks-model-serving (ops) + databricks-ml-training (experimental) by QuentinAmbard · Pull Request #110 · databricks/databricks-agent-skills

QuentinAmbard · 2026-05-28T08:36:51Z

Stacks on top of #84. Merge #84 first; this PR includes those commits at the base and adds the new skill on top.

Why this PR exists

PR #84 lands the model-serving content (endpoint create, query, update, traffic config, AI Gateway, Foundation Model API discovery) into databricks-model-serving. That's the right shape for a serving-ops skill, and it's what reviewers should expect a skill called "model serving" to contain.

The remaining a-d-k content — training a model with MLflow autolog, registering it to Unity Catalog, promoting versions via @prod aliases, custom PyFunc authoring, hand-rolled ResponsesAgent code — is a different lifecycle. It runs before an endpoint exists, often in a notebook submitted as a serverless job, and an agent asked to "train an XGBoost model and deploy it" needs both concerns surfaced cleanly rather than blended into one skill description.

This PR lands the dev-side content as a separate databricks-ml-training experimental skill, and weaves a few small but high-leverage serving-side fixes from the original a-d-k content into databricks-model-serving where they belong.

What this PR improves

A focused dev-side skill. New experimental/databricks-ml-training/ owns the training → register → consume narrative: MLflow autolog with Optuna for hyperparameter tuning, mlflow.set_registry_uri(\"databricks-uc\") + experiment-parent-folder pre-creation, alias-based promotion (@prod / @challenger), batch scoring via mlflow.pyfunc.spark_udf, custom PyFunc with the file-based "Models from Code" pattern, hand-rolled ResponsesAgent with LangGraph + UC Function + Vector Search tools, and the databricks jobs submit --no-wait train-and-deploy pattern.

Frontmatter triggers that actually triage. Each skill's description lists what it IS for and what it explicitly is NOT for, with cross-pointers (databricks-ml-training says "use databricks-model-serving for endpoint ops"; databricks-model-serving says "use databricks-ml-training for training and PyFunc authoring"). When the user says "train a model and deploy it," the orchestrator pulls both skills exactly once each.

Cross-skill links that resolve. Every databricks-model-serving → databricks-ml-training link and every reverse link uses the right relative path for the stable-skills/ ↔ experimental-experimental/ layout. No broken anchors, no stale paths to the old training-and-serving.md filename anywhere.

Five small but high-leverage gaps closed in databricks-model-serving. The original a-d-k port left a few non-obvious serving behaviors implicit. Each fix is woven into the existing section that already covers the topic — no new mega-sections, no duplication of MLflow boilerplate an LLM already knows from training data. The result: serving-side behavior an agent would otherwise have to guess at is now explicit and signposted.

Summary of changes

Area	What changed
New experimental skill	`experimental/databricks-ml-training/` with SKILL.md + agents/ + assets/ + references/{custom-pyfunc.md, genai-agents.md}. Owns the full dev-side narrative (autolog + Optuna, UC registration, alias promotion, batch scoring, custom PyFunc, custom ResponsesAgent, train-and-deploy serverless job pattern).
Frontmatter scoping	Both skills' descriptions list scope + explicit NOT-for callouts pointing at their sibling. Triggers triage cleanly when the user mentions both training and deployment.
Cross-links	All cross-skill paths fixed for the stable ↔ experimental layout. Foundation Model API discovery moved into `databricks-model-serving/SKILL.md` inline (was previously linked into the relocated training file).
Serving gaps closed	Five small fixes in `databricks-model-serving/SKILL.md`: MLflow Deployments client gotchas (`tags=` top-level, `served_model_name` derivation), zero-downtime version-swap pattern (alias-repoint AND `update_endpoint`), two-state-field readiness rationale (`state.ready` lies during version-swap), classical-ML `dataframe_records` query example, Serving-UI "Owned by me" SP-filter troubleshooting row. Each merged into the existing section that already covered the topic.
DAS-only content preserved	PR #84's idempotency check before agent deploy, AppKit integration section, off-platform AI SDK v6 streaming, endpoint-structure ASCII diagram, OpenAPI schema section — all kept.
Versioning	`databricks-model-serving` bumped to 0.4.0 (description retightened, gaps closed). New `databricks-ml-training` at 0.1.0 under `experimental/`. Manifest regenerated; 27 skills total.

Reviewer aid

The split is on the natural seam — anything that runs before mlflow.deployments.get_deploy_client(...).create_endpoint(...) is dev-side and lives in databricks-ml-training, anything from create_endpoint onward is ops-side and lives in databricks-model-serving. The Python create_endpoint(...) / update_endpoint(...) call itself is canonically a serving operation and is documented there with the two non-obvious gotchas.

Validation: python3 scripts/skills.py validate passes; zero broken links across both touched skills.

This pull request and its description were written by Isaac.

Phase 1 of databricks#73's TODO #1b. Adds references/fm-api-endpoints.md with the curated Foundation Model API endpoint table (chat/instruct + embedding models) from databricks-solutions/ai-dev-kit's model-serving skill, plus common defaults and query examples (CLI + SDK). Stripped: the cloud/language prefix on the docs link, and the leftover MCP-tool references in the source. The endpoint table itself is static catalog data — no MCP coupling. SKILL.md updates: - bump version to 0.2.0 - point Endpoint Types table at the new reference - point the Foundation Model discovery bullet at the new reference Subsequent phases (separate PRs / commits) port the remaining dev-side content: classical-ml autolog patterns, Custom PyFunc signatures, ResponsesAgent with the create_text_output_item gotcha, UCFunctionToolkit + VectorSearchRetrieverTool resource passthrough. Co-authored-by: Isaac

Aligns the verbatim a-d-k port with the live docs.databricks.com supported-models page (validated via WebFetch on 2026-05-26): ADDED (missing from a-d-k snapshot): - databricks-claude-opus-4-7 (now most capable Claude) - databricks-gpt-5-5-pro, 5-5 - databricks-gpt-5-4, 5-4-mini, 5-4-nano - databricks-gpt-5-3-codex, 5-2-codex - databricks-gemini-3-1-flash-lite, 3-5-flash - databricks-qwen35-122b-a10b (Preview) REMOVED (retired, no longer in docs): - databricks-claude-3-7-sonnet - databricks-meta-llama-3-1-405b-instruct UPDATED notes: - claude-opus-4-6 no longer "Most capable" - gpt-5-2 no longer "Latest" - gpt-5-1-codex-{max,mini} + gpt-5-2-codex marked retiring 2026-07-16 - gemini-3-pro marked retired 2026-03-26 with redirect through 2026-06-07 - Several Gemini / Codex endpoints annotated with cross-geo requirement - qwen3-next-80b annotated as Preview OPENING PARAGRAPH: - "available in every workspace" -> "available in supported Model Serving regions"; calls out cross-geo requirement for several endpoints NOT TOUCHED (out of scope: not docs-validatable from supported-models page): - served_entities[].entity_name guidance (line 3 second half) - SKILL.md "system.ai.* catalog" claim on the pay-per-token row These remain as in the a-d-k snapshot and should be revisited if/when docs cover them directly. Test plan: `scripts/skills.py validate` -> "Everything is up to date"; `scripts/skills.py generate` -> only refreshes manifest.json timestamps. Co-authored-by: Isaac

…ot static catalog Quentin pointed out (PR databricks#84) that the prior two commits actually ported from `main:databricks-skills/databricks-model-serving/`, not `experimental:databricks-skills/databricks-ml-training-serving/` as the PR description claimed. The two skills take opposite approaches: - `main` ships a static catalog table of FM API endpoint names. - `experimental` deliberately rejects that ("a static skill list goes stale fast — always list at runtime instead of hard-coding names") and ships a `databricks serving-endpoints list | jq ...` one-liner plus runtime-resolved defaults (highest-numbered Claude Sonnet for agents, highest-numbered `-codex-max` for code). Re-port to match the experimental philosophy: - `references/fm-api-endpoints.md`: replace the static catalog with the runtime-list snippet (filtered by `databricks-` name prefix AND `system.ai.*` served entity, to exclude non-FM endpoints sharing the prefix), runtime-resolved family defaults, and CLI + SDK query examples that use a placeholder endpoint name rather than a hard-coded model. - `SKILL.md`: update the Endpoint Types row + the Foundation-Model discovery bullet to reframe the reference as "discover at runtime" rather than "curated table". Version stays at 0.2.0 (frontmatter unchanged → manifest unchanged). The 2026-05-26 catalog refresh in the previous commit is dropped here: the experimental skill's point is that no static table is the right shape, so curating one against docs.databricks.com isn't useful for the stable skill either. Co-authored-by: Isaac

…ental port Previous commit (c148500) restated the experimental section in my own words and added a "Querying" section + provisioned-throughput aside + docs-link gloss that aren't in the upstream skill. The PR's stated goal is to port from experimental — do an actual port, not a paraphrase. `references/fm-api-endpoints.md` now mirrors the `## Foundation Model API endpoints` section of `experimental:databricks-ml-training-serving/SKILL.md` verbatim (heading promoted from `##` to `#` since this is a standalone file): intro paragraph + the `databricks serving-endpoints list | jq ...` one-liner + the family-based default-picking rule. Nothing else. Also trim the SKILL.md discovery bullet back toward its original shape — link to the reference file for the runtime-list snippet, then the same `system.ai` / `serving-endpoints list` / `get-open-api` alternatives that were already there. Co-authored-by: Isaac

…ntal Expands the port from the FM-endpoints-only scope to cover every section of `experimental:databricks-ml-training-serving/`. Mirrors the experimental skill's 3-file structure 1:1 into stable's `references/` directory; the standalone fm-api-endpoints.md added in earlier commits goes away (its content lives inline in training-and-serving.md exactly as it does in experimental's SKILL.md). Added (all verbatim ports, mechanical adjustments only): references/training-and-serving.md Ports experimental SKILL.md content. Mechanical changes only: frontmatter stripped (destination is a reference file, not a SKILL.md); `1-custom-pyfunc.md` → `custom-pyfunc.md`, `2-genai-agents.md` → `genai-agents.md` (filename renames); `../<skill>/SKILL.md` → `../../<skill>/SKILL.md` (one more level of nesting since this file is in references/ rather than at the skill root). Content covers: canonical train/register/serve flow, `mlflow.{sklearn,xgboost,…}.autolog()` patterns, UC alias-based promotion, batch scoring via `spark_udf`, real-time endpoint create + zero-downtime version swap, `state.ready` vs `state.config_update` poll-both gotcha, `jobs submit --no-wait` serverless deploy pattern, Foundation Model API endpoints runtime-list, and the full gotchas trap-table. references/custom-pyfunc.md Ports experimental 1-custom-pyfunc.md verbatim. Mechanical change: `[SKILL.md]` → `[training-and-serving.md]` where the original cross-referenced its parent SKILL.md. Content: file-based PyFunc ("Models from Code"), `infer_signature`, `code_paths`, pre-deploy validation via `mlflow.models.predict(env_manager="uv")`. references/genai-agents.md Ports experimental 2-genai-agents.md verbatim. Mechanical changes: cross-skill paths bumped one level deeper; `[SKILL.md]` → `[training-and-serving.md]`. Content covers: `ResponsesAgent` interface, LangGraph agent with `UCFunctionToolkit` + `VectorSearchRetrieverTool`, the `create_text_output_item` raw-dict-silently-fails gotcha, the `resources=[...]` passthrough-auth list (DatabricksServingEndpoint, DatabricksFunction, DatabricksVectorSearchIndex, DatabricksLakebase), async deploy via `agents.deploy()` from a serverless job, query via CLI and OpenAI-compatible client. Removed: references/fm-api-endpoints.md Standalone file from earlier commits; its content lives inline in training-and-serving.md exactly as it does in experimental's SKILL.md, so the deliberate split is no longer needed. Stable SKILL.md updates (minimal, ops-focus preserved): - FM-endpoint link targets updated from `references/fm-api-endpoints.md` to `references/training-and-serving.md#foundation-model-api-endpoints` in the Endpoint Types table row and the FM-discovery bullet. - New `### Develop & deploy new models` subsection under "What's Next" with a 3-row table pointing at the new dev-side references, framed as "this skill is ops-focused; for the dev-side flow, see below". Manifest regenerated. Co-authored-by: Isaac

- The mechanical `../` → `../../` rewrite in the verbatim port assumed every peer skill is stable, but 4 of them live in `experimental/`. `../../<skill>/SKILL.md` resolved to `skills/<skill>/SKILL.md` which does not exist for `databricks-agent-bricks`, `databricks-mlflow-evaluation`, `databricks-vector-search`, `databricks-unity-catalog`. Repointed to `../../../experimental/<skill>/SKILL.md`. `databricks-jobs` link unchanged (it's stable). - SKILL.md frontmatter `description` only described the ops surface, so agents wouldn't route dev-side asks (train, register, PyFunc, ResponsesAgent) to this skill. Broadened to cover both ops and the new dev surface. - Version bumped 0.2.0 → 0.3.0 + manifest regenerated. Co-authored-by: Isaac

…-phase1 # Conflicts: # manifest.json

@simonfaltum

Per @simonfaltum review: before resubmitting a deploy serverless job, agents should check whether a run is already in flight (active job runs filtered on run_name) or whether the target endpoint already exists in the right state. Avoids wasting ~15 min of serverless and racing for the same endpoint name. Co-authored-by: Isaac

…icks-ml-training Splits the post-port databricks-model-serving skill into two skills with clean responsibility boundaries: databricks-model-serving keeps the endpoint lifecycle / ops surface, and a new experimental databricks-ml-training owns the dev-side training, MLflow tracking, UC registration, custom PyFunc, and hand-rolled ResponsesAgent content. Also closes five small gaps in databricks-model-serving where non-obvious serving behavior from the original a-d-k port had fallen through the cracks (Python deployments client gotchas, zero-downtime version swap, two-field readiness rationale, classical-ML query shape, Serving-UI SP filter). Co-authored-by: Isaac

simonfaltum

Reviewed the proposed end state. Note this PR is one commit stacked on the still-open #84, so what shows here is the combined ~712-line delta against main; merge #84 first, then re-check this PR's true delta (relevant to the vector-search link below).

Verdict: fix-then-merge - no blockers, but a few things to address, flagged inline.

Headline items:

The HPO "train and register" example silently promotes the wrong model (autolog registers every trial; promotion picks latest-by-version, not best-by-metric).
A cross-link to databricks-jobs points at a section (and content) that doesn't exist.
The databricks-vector-search link will break once rebased onto current main (vector-search moved to skills/).
MLflow pins (mlflow==2.22.0) contradict the "MLflow 3" text and the skill's own pin-to-runtime rule.

Verified clean (so you see coverage):

python3 scripts/skills.py validate passes; manifest / Codex metadata / icons in sync. (The model-serving manifest description staying short is by design - stable skills get a curated description, experimental ones derive from frontmatter.)
The MLflow APIs are real, not invented: ResponsesAgent, the ResponsesAgentRequest/Response/StreamEvent classes, output_to_responses_items_stream / to_chat_completions_input (match the official ResponsesAgent docs), and DatabricksLakebase(database_instance_name=...).
CLI flags verified against the CLI: jobs submit --no-wait, jobs list-runs --active-only -o json.
No real credentials / workspace IDs; placeholders throughout; no destructive defaults.
Strategy is strong: climbs to MLflow + UC registry + serverless jobs + serving, lands a durable governed gold UC table, delegates no-code agents to databricks-agent-bricks, public APIs only.

The model-serving additions (Deployments-client gotchas, alias + update_endpoint version swap, two-field readiness, dataframe_records query, runtime FM-API listing, SP-filter troubleshooting) are high-signal - one small client-variable issue noted inline.

Posted as a COMMENT (advisory, non-blocking).

simonfaltum · 2026-06-08T13:42:15Z

+client = MlflowClient(registry_uri="databricks-uc")
+latest = max(client.search_model_versions(f"name='{FULL_NAME}'"),
+             key=lambda v: int(v.version))
+client.set_registered_model_alias(FULL_NAME, "prod", latest.version)


Promotes the wrong model. With autolog(registered_model_name=FULL_NAME) (line 85), every trial's .fit() logs and registers a version, so 20 trials produce ~20 versions. max(..., key=version) here then picks the last trial to finish, not the best by AUC, so @prod lands on an arbitrary model and the Optuna search is wasted. The prose at line 45 ("the best one is what gets registered") is inaccurate.

Fix: after study.optimize, either retrain once on study.best_params in a single parent run and register that, or select the winning run explicitly, e.g. client.search_runs(experiment_ids=[...], order_by=["metrics.<auc> DESC"], max_results=1) and alias that version.

simonfaltum · 2026-06-08T13:42:15Z

+# → '{"model_version":"3","val_auc":0.91,"rows_scored":124,"endpoint":"turbine-risk-endpoint"}'
+```
+
+For the four `jobs submit` traps (`spec.client: "4"` requirement, TASK-vs-submit run_id, `print()` unreliable, tags rejected) and full debugging flow, see **[databricks-jobs](../../skills/databricks-jobs/SKILL.md#one-time-runs-jobs-submit--async-pattern-for-notebooks)**.


This anchor doesn't resolve. databricks-jobs/SKILL.md has no heading matching #one-time-runs-jobs-submit--async-pattern-for-notebooks, and the "four jobs submit traps / full debugging flow" content isn't in that skill at all (grep for jobs submit, --no-wait, notebook_output finds nothing; only spec.client: "4" exists, and it's in its references/task-types.md). An agent following the link lands at the top of databricks-jobs and never finds what's promised here.

Fix: inline the four traps here, point at the real location, or add the section to databricks-jobs.

simonfaltum · 2026-06-08T13:42:15Z

+- **[databricks-model-serving](../../skills/databricks-model-serving/SKILL.md)** — serving-endpoint lifecycle (create, query, update-config, version-swap, AI Gateway, Foundation Model API endpoints).
+- **[databricks-agent-bricks](../databricks-agent-bricks/SKILL.md)** — no-code Knowledge Assistants and Supervisor Agents. Prefer this over hand-rolling agents.
+- **[databricks-mlflow-evaluation](../databricks-mlflow-evaluation/SKILL.md)** — evaluate model/agent quality before promoting `@prod`.
+- **[databricks-vector-search](../databricks-vector-search/SKILL.md)** — vector indexes used as retrieval tools in agents.


This link will break once the PR rebases onto current main. databricks-vector-search was promoted from experimental/ to skills/ on main (it's now skills/databricks-vector-search/, gone from experimental/). ../databricks-vector-search/SKILL.md resolves on this branch only because it's based on an older main.

Fix: ../../skills/databricks-vector-search/SKILL.md.

simonfaltum · 2026-06-08T13:42:15Z

+        resources=resources,             # auto-auth — DO NOT skip
+        input_example={"input": [{"role": "user", "content": "What's the maintenance history for turbine WTG-12?"}]},
+        pip_requirements=[
+            "mlflow==2.22.0",


This pin contradicts the surrounding text and the skill's own rule. The text calls ResponsesAgent "MLflow 3's standardized agent interface" and notes "DBR 16.1+ has mlflow 3.x", but pins mlflow==2.22.0 here (and in custom-pyfunc.md:72, SKILL.md:182). The Gotchas table (SKILL.md:236) warns that a pip_requirements mismatch crashes the endpoint at load and says to pin to the runtime. Logging from a DBR-3.x runtime but forcing serving to 2.22.0 is exactly that skew.

Fix: pin the MLflow 3.x version DBR ships, or use the live f"mlflow=={version('mlflow')}" pattern the skill already recommends. (ResponsesAgent does exist in 2.22.0, so it's not a guaranteed import error, but the version skew with databricks-langchain / langgraph is the real risk.)

simonfaltum · 2026-06-08T13:42:15Z

@@ -0,0 +1,257 @@
+---
+name: databricks-ml-training
+description: "Classical ML and custom-agent model training, MLflow tracking, and Unity Catalog model registration on Databricks. Use when the user asks to: train models (with MLflow, sklearn, XGBoost, LightGBM, PyTorch, custom pyfunc, etc.); run hyperparameter tuning with Optuna; register models to Unity Catalog and promote versions with `@prod` / `@challenger` aliases; load a registered model for batch scoring via `mlflow.pyfunc.spark_udf`; run inferences as batch, build custom MLflow PyFunc models (Models from Code); author a custom MLflow `ResponsesAgent` (LangGraph, OpenAI-compatible chat) with UC Function or Vector Search tools. NOT for: managing existing serving endpoints (use databricks-model-serving); no-code Knowledge Assistants or Supervisor Agents (use databricks-agent-bricks); MLflow evaluation / scorers (use databricks-mlflow-evaluation)."


At ~850 characters this is the longest skill description in the repo (the experimental median is ~250; the next-longest is ~740). It's the agent's routing input. The explicit "Use when / NOT for" triage is genuinely useful and worth keeping, but the framing around it could be trimmed to bring it closer to siblings.

simonfaltum · 2026-06-08T13:42:15Z

+from langgraph.prebuilt.tool_node import ToolNode
+from typing import Annotated, Generator, Sequence, TypedDict
+
+LLM_ENDPOINT = "databricks-claude-sonnet-4-6"   # resolve at runtime — see training-and-serving.md


Stale reference. training-and-serving.md doesn't exist anywhere in the repo (the PR description itself says no such paths remain). Drop the comment or point at the live source for resolving the LLM endpoint at runtime.

simonfaltum · 2026-06-08T13:42:15Z

+model_name    = sys.argv[1]
+version       = sys.argv[2]
+endpoint_name = sys.argv[3] if len(sys.argv) > 3 else None
+
+# Always pass endpoint_name explicitly — auto-derived names are
+# `agents_<catalog>-<schema>-<model>` with dots → dashes, which is unpredictable.
+kwargs = {"tags": {"aidevkit_project": "ai-dev-kit"}}
+if endpoint_name:
+    kwargs["endpoint_name"] = endpoint_name
+
+deployment = agents.deploy(model_name, version, **kwargs)
+
+# Land structured output via dbutils.notebook.exit — print() unreliable on serverless.
+dbutils.notebook.exit(json.dumps({
+    "endpoint_name":  deployment.endpoint_name,
+    "query_endpoint": deployment.query_endpoint,
+}))


This won't run as written. The block reads parameters from sys.argv[1..3], but line 220 says to submit it "as the notebook" via jobs submit, and the submit JSON passes no parameters. Notebook tasks receive parameters via dbutils.widgets, not sys.argv, so sys.argv[1] raises IndexError. (dbutils.notebook.exit at line 214 is notebook-only, confirming this is meant as a notebook.)

Fix: read params via dbutils.widgets.get(...) and pass base_parameters in the submit, or run it as a spark_python_task.

simonfaltum · 2026-06-08T13:42:15Z

+client.set_registered_model_alias(FULL_NAME, "prod", new_version)
+client.update_endpoint(endpoint=ENDPOINT_NAME, config={
+    "served_entities": [{"entity_name": FULL_NAME, "entity_version": new_version,
+                         "workload_size": "Small", "scale_to_zero_enabled": True}],
+    "traffic_config": {"routes": [
+        {"served_model_name": f"{NAME}-{new_version}", "traffic_percentage": 100}
+    ]},
+})


These two calls are on different client types but share one client variable. set_registered_model_alias(...) is a method on mlflow.tracking.MlflowClient; update_endpoint(...) is on the MLflow Deployments client (mlflow.deployments.get_deploy_client("databricks")). As written, a single client can't do both - whichever it is, the other call raises AttributeError.

Fix: use two distinctly-named clients, e.g. mlflow_client.set_registered_model_alias(...) and deploy_client.update_endpoint(...).

simonfaltum · 2026-06-08T13:42:16Z

+scored = features.withColumn("risk_score", predict(*[features[c] for c in feature_cols]))
+
+# Overwrite-per-run pattern for "latest score per entity":
+scored.select("turbine_id", "risk_score", F.current_timestamp().alias("scored_at")) \


Two copy-paste snags: feature_cols is used but never defined, and F.current_timestamp() needs from pyspark.sql import functions as F (not auto-imported). Worth fixing so the example runs verbatim.

- ml-training/SKILL.md: - Autolog without registered_model_name; retrain best params explicitly and register that single model (avoids max-by-version landing on last trial). - Add `from pyspark.sql import functions as F` import; define feature_cols in the batch-scoring block. - Drop the bogus `#one-time-runs-jobs-submit--async-pattern-for-notebooks` anchor; reword the jobs-submit traps inline and point at databricks-jobs. - Normalize `../../skills/X/SKILL.md` -> `../X/SKILL.md` (install-flatten). - Pin `mlflow==3.1.0`. - ml-training/references/custom-pyfunc.md: pin `mlflow==3.1.0`; normalize the model-serving link to install-flatten path. - ml-training/references/genai-agents.md: pin `mlflow==3.1.0`; replace stale `training-and-serving.md` pointer with `databricks-model-serving`; replace `sys.argv[...]` with `dbutils.widgets` for notebook param wiring; update default deploy tag to `ai_generated_source=databricks-agent-skills`. - model-serving/SKILL.md: split shared `client` into `registry` (MlflowClient) and `deploy` (Deployments client) — the two surfaces are different objects even though both happened to be called `client` before. Co-authored-by: Isaac

QuentinAmbard · 2026-06-12T19:21:19Z

Thanks for the thorough review, @simonfaltum — addressed in 68bf9e5:

experimental/databricks-ml-training/SKILL.md

Autolog now runs without registered_model_name. A separate best run retrains with study.best_params and registers that single artifact, so @prod points at the actual winning trial rather than whichever trial happened to finish last.
Added the missing from pyspark.sql import functions as F import and defined feature_cols in the batch-scoring block.
Dropped the bogus #one-time-runs-jobs-submit--async-pattern-for-notebooks anchor and reworded the jobs-submit traps inline. Also normalized ../../skills/X/SKILL.md → ../X/SKILL.md everywhere — the install-flatten convention puts experimental/ and skills/ in the same flat directory at install time.
Bumped mlflow==2.22.0 → mlflow==3.1.0.

references/custom-pyfunc.md

mlflow==3.1.0; same install-flatten path normalization.

references/genai-agents.md

mlflow==3.1.0.
Replaced the stale training-and-serving.md pointer with databricks-model-serving.
sys.argv[...] → dbutils.widgets for the deploy_agent notebook params.
Default deploy tag is now ai_generated_source=databricks-agent-skills.

skills/databricks-model-serving/SKILL.md

Split the shared client into registry = MlflowClient(...) and deploy = get_deploy_client(...). The alias call and the endpoint update target two different SDK surfaces — reusing one variable name was misleading.

Description length: left as-is per discussion — the long discriminator helps Claude pick the right skill when there are several adjacent ML skills.

…l-training-split # Conflicts: # manifest.json # skills/databricks-model-serving/SKILL.md

…able After merging upstream/main (which brought in PR databricks#84's pre-split references/{training-and-serving,custom-pyfunc,genai-agents}.md and a stale "What's Next" table pointing at them), clean up the residuals and tighten both skill descriptions so an LLM orchestrator routes correctly: - skills/databricks-model-serving/SKILL.md: - Drop the duplicate `### Develop & deploy new models` block left by the upstream merge. The earlier copy pointed at the now-removed `references/training-and-serving.md`, `references/custom-pyfunc.md`, `references/genai-agents.md`. Keep the single pointer to databricks-ml-training, and normalize its path to the install-flatten convention (`../databricks-ml-training/SKILL.md`). - Expand the YAML `description:` to mention OpenAPI schema retrieval, serving logs/metrics/permissions inspection, and off-platform streaming (Vercel AI SDK v6 / standalone Node.js into AI Gateway). Those are all already in the body / off-platform-streaming.md but weren't in the description's trigger list, so user phrasings like "stream from my Next.js app to a Databricks model" or "get the OpenAPI spec for my endpoint" wouldn't route here. - experimental/databricks-ml-training/SKILL.md: - Add `submit a train-and-deploy notebook as a databricks jobs submit --no-wait serverless one-time run` to the description trigger list — that pattern has its own section in the body but wasn't in the description. Regenerated manifest.json. Co-authored-by: Isaac

QuentinAmbard · 2026-06-17T16:25:44Z

@simonfaltum @dustin-anchorage when you get a chance, could you do a final pass? Conflict with upstream/main is resolved (merge brought in #84's pre-split refs which I removed since the dev-side content lives in the new ml-training skill), and the two skill frontmatter descriptions were also tightened to cover OpenAPI schemas, off-platform streaming, serving logs/metrics, and the jobs-submit serverless pattern — body content already covered these but the description didn't, so an LLM orchestrator would have missed those user phrasings. Tip is e9692af. Thanks!

Both descriptions were the longest in the repo (model-serving 895 chars, ml-training 949) vs an experimental median of ~250 and a next-longest of ~559. The "Use when / NOT for" triage is load-bearing for routing and stays, but the framing was over-verbose. - Collapse the CRUD verb list ("create, query, update, scale, or delete serving endpoints") to "CRUD serving endpoints" — same trigger surface, much denser. - Drop redundant qualifiers in NOT-for lists ("no-code ... use X" → "(X)"; "MLflow evaluation / scorers" → "MLflow evaluation"; etc.). - ml-training: drop the framework enumeration ("MLflow, sklearn, XGBoost, ..."), fold hyperparameter-tuning into the train trigger, shorten "Models from Code" and `ResponsesAgent` qualifiers — the body retains the full detail; description only needs to route. Result: model-serving 895→717 chars, ml-training 949→586. Still on the high end (these skills genuinely have many triggers) but no longer outliers. Co-authored-by: Isaac

Add `classification/regression` and the four most common framework names (XGBoost, scikit-learn, LightGBM, PyTorch) to the train trigger. The body already covers these, but agents routing on a phrase like "train an XGBoost classifier" or "regression model" benefit from explicit keyword hits in the description. Co-authored-by: Isaac

QuentinAmbard · 2026-06-17T16:30:44Z

@simonfaltum @dustin-anchorage when you have a minute, could you do a final pass?

Since the last round:

Resolved the conflict with upstream/main (the merge brought in PR skills(model-serving): merge dev-side training/agent flows from a-d-k experimental #84's pre-split refs, which I removed since the dev-side content now lives in databricks-ml-training).
Densified both frontmatter descriptions per Simon's earlier comment (model-serving 895→717 chars, ml-training 949→640). The 'Use when / NOT for' triage stays since it's the routing input, but the framing around it is tighter — CRUD verbs collapsed, redundant qualifiers dropped.
ml-training description now surfaces task + framework keywords (classification/regression, XGBoost, scikit-learn, LightGBM, PyTorch) so phrasings like 'train an XGBoost classifier' hit the trigger.
Audit caught a stale 'What's Next' refs table left by the merge — dropped it.

Tip is a8ebfbb. Thanks!

dustinvannoy-db

One suggestion I'd like you to accept to fix a statement that contradicts docs. Then you can merge.

dustinvannoy-db · 2026-06-17T20:57:51Z

+
+## Train and register (the 90% case)
+
+`mlflow.autolog()` captures params, metrics, code, and the model artifact for every run; `registered_model_name=...` auto-registers the best run to UC (auto-incremented version). Wrap training with **Optuna** so each trial is a child run and the best one is what gets registered.


Suggested change

`mlflow.autolog()` captures params, metrics, code, and the model artifact for every run; `registered_model_name=...` auto-registers the best run to UC (auto-incremented version). Wrap training with **Optuna** so each trial is a child run and the best one is what gets registered.

`mlflow.autolog()` captures params, metrics, code, and the model artifact for every run. Wrap training with **Optuna** so each trial is a nested run. **Don't** set `registered_model_name=…` on autolog — it registers a new UC version on *every* trial; instead retrain once on `study.best_params` and register only that winning model (below).

jamesbroadhead and others added 9 commits May 26, 2026 09:47

Merge remote-tracking branch 'origin/main' into jb/model-serving-port…

15d7b4c

…-phase1 # Conflicts: # manifest.json

QuentinAmbard requested review from a team, dustinvannoy-db, lennartkats-db and simonfaltum as code owners May 28, 2026 08:36

QuentinAmbard mentioned this pull request May 28, 2026

skills(model-serving): merge dev-side training/agent flows from a-d-k experimental #84

Merged

6 tasks

simonfaltum approved these changes Jun 2, 2026

View reviewed changes

simonfaltum reviewed Jun 8, 2026

View reviewed changes

QuentinAmbard requested a review from a team as a code owner June 17, 2026 16:17

Quentin Ambard added 2 commits June 17, 2026 18:20

Merge remote-tracking branch 'upstream/main' into skills/databricks-m…

e21c472

…l-training-split # Conflicts: # manifest.json # skills/databricks-model-serving/SKILL.md

Quentin Ambard added 2 commits June 17, 2026 18:28

dustinvannoy-db approved these changes Jun 17, 2026

View reviewed changes


		## Train and register (the 90% case)

		`mlflow.autolog()` captures params, metrics, code, and the model artifact for every run; `registered_model_name=...` auto-registers the best run to UC (auto-incremented version). Wrap training with Optuna so each trial is a child run and the best one is what gets registered.

Conversation

QuentinAmbard commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this PR exists

What this PR improves

Summary of changes

Reviewer aid

Uh oh!

simonfaltum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QuentinAmbard commented Jun 12, 2026

Uh oh!

QuentinAmbard commented Jun 17, 2026

Uh oh!

QuentinAmbard commented Jun 17, 2026

Uh oh!

dustinvannoy-db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

QuentinAmbard commented May 28, 2026 •

edited

Loading