Expand AI agent detection in user-agent by simonfaltum · Pull Request #768 · databricks/databricks-sdk-java

simonfaltum · 2026-04-20T08:37:33Z

Why

We report an agent/<name> segment in the SDK user-agent when we can identify an AI coding agent driving the SDK. The current list covers 8 agents. This PR fills obvious gaps (Goose, Amp, Augment, Kiro, Windsurf), adds best-effort detection for VS Code Copilot (distinct from the already-detected Copilot CLI), and honors the emerging AGENT=<name> agents.md standard with an unknown fallback.

Identical changes are going out in parallel PRs for the Go and Python SDKs.

Changes

Before: each entry was a single (envVar, product) pair. Presence of any non-empty value on the env var would fire the match. Multi-match returned empty.

Now: each agent record holds a product name and a list of matchers. A matcher is either presence-only or an exact value match. An agent fires if any of its matchers fires. Ambiguity is judged by unique product (not raw matcher hits), so the same agent exposing both a bespoke env var and AGENT=<name> is not ambiguous with itself. When zero known agents match and AGENT is set to a non-empty value, detection returns unknown.

New detections: amp, augment, copilot-vscode, goose, kiro, windsurf. Goose and Amp also match on AGENT=goose and AGENT=amp respectively. Presence-only matchers now treat an empty env value as set (matching the Go SDK's os.LookupEnv semantics), so CLAUDECODE="" counts as Claude Code.

Test plan

Unit tests cover every new agent (goose, amp, augment, copilot-vscode, kiro, windsurf)
AGENT=goose alone detects goose
GOOSE_TERMINAL=1 + AGENT=goose detects goose (not ambiguous, same product)
AMP_CURRENT_THREAD_ID + AGENT=amp detects amp (not ambiguous)
AGENT=someweirdthing falls back to unknown
AGENT="" does not trigger the unknown fallback
AGENT=goose + CLAUDECODE=1 returns empty (ambiguity between two distinct products)
mvn -pl databricks-sdk-java test -Dtest=UserAgentTest passes (34 tests)
mvn -pl databricks-sdk-java spotless:check clean

Add detection for Goose, Amp, Augment, VS Code Copilot, Kiro, and Windsurf. Also honor the agents.md standard AGENT env var with an "unknown" fallback when set to a value we don't recognize. Switches the detection data model from (envVar, product) pairs to agent records with a list of matchers. Each agent fires if any of its matchers fires (presence-only or exact value). Ambiguity is judged by unique product, not raw matcher hits, so the same agent setting both a bespoke var and AGENT=<name> is not ambiguous. Co-authored-by: Isaac Signed-off-by: simon <simon.faltum@databricks.com>

- Add NEXT_CHANGELOG.md entry covering the expanded agent list, the AGENT standard, and the empty-string semantics change. - When the main matcher loop finds no match and AGENT is set to a known product name, return that product name instead of "unknown" (implicit known-product fallback). Known matchers still win over the fallback, so AGENT=cursor + CLAUDECODE=1 still yields claude-code. - Restore alphabetical ordering: openclaw before opencode. - Add provenance comments on new agent entries (goose, amp, augment, copilot-vscode, kiro, windsurf). - New tests: testAgentProviderAgentEnvAmp, testAgentProviderAgentEnvCursor, testAgentProviderKnownMatcherWinsOverAgentFallback. Co-authored-by: Isaac Signed-off-by: simon <simon.faltum@databricks.com>

Previously, agents like amp and goose had dual matchers: their explicit env var (AMP_CURRENT_THREAD_ID, GOOSE_TERMINAL) plus AGENT=<name>. This caused asymmetric ambiguity: AGENT=goose + CLAUDECODE=1 resolved to "" (both matchers fired on different products), while AGENT=cursor + CLAUDECODE=1 resolved to "claude-code" (only claude-code matched, cursor was handled by the AGENT fallback which does not trigger once an explicit matcher has fired). The rule is now uniform: explicit env var matchers always take precedence over the generic AGENT=<name> signal. AGENT is treated purely as a fallback for agents without an explicit matcher, or for products we do not yet specifically recognize. Changes: - Remove per-agent AGENT=<name> matchers from amp and goose entries. Those products still set AGENT=<name>; the central fallback in lookupAgentProvider handles them. - Update the lookupAgentProvider doc comment to reflect the new rule. - Flip the existing AGENT=goose + CLAUDECODE=1 test to expect "claude-code" and rename accordingly. - Add test for GOOSE_TERMINAL=1 + AGENT=cursor -> "goose". - Add test for COPILOT_CLI=1 + COPILOT_MODEL=gpt-4 -> "" (documents the known, intentional ambiguity for Copilot CLI BYOK users). - Update NEXT_CHANGELOG entry to mention precedence rule. Signed-off-by: simon <simon.faltum@databricks.com>

Signed-off-by: simon <simon.faltum@databricks.com>

Nested agents (e.g. a Cursor CLI subagent spawned by Claude Code) set multiple agent env vars on the same process. The previous ambiguity guard silently dropped the signal in that case. Report "multiple" instead so the stacked case is visible in telemetry. Also collapse the known BYOK false positive where Copilot CLI users have COPILOT_MODEL set alongside COPILOT_CLI: that pair now reports "copilot-cli" rather than "multiple". Co-authored-by: Isaac Signed-off-by: simon <simon.faltum@databricks.com>

## Summary Adds detection for 15 AI coding agents (amp, antigravity, augment, claude-code, cline, codex, copilot-cli, copilot-vscode, cursor, gemini-cli, goose, kiro, openclaw, opencode, windsurf) so the SDK emits a single `agent/<name>` segment in its user-agent string when an agent is identified. Mirrors parallel work in the Go (databricks/databricks-sdk-go#1637), Java (databricks/databricks-sdk-java#768), and Python (databricks/databricks-sdk-py#1394) SDKs so all four SDKs ship the same canonical list and precedence rules. ## Why Databricks wants visibility into which AI coding agents are calling our APIs so that we can understand adoption, prioritize fixes for the environments our customers use, and detect compatibility issues early. The three sibling SDKs just landed this feature; the JS SDK has a smaller detection list (9 agents), emits one segment per detected agent instead of a single canonical segment, and does not honor the `AGENT=<name>` standard from agents.md. Without this change, traffic from JS SDK users running inside agents is invisible or reported inconsistently with the other SDKs. The library policy in `.agent/rules/libraries.mdc` prefers picking a dependency over hand-rolling. We intentionally deviate here: the canonical agent list, env var names, and precedence rules are coordinated across four SDKs, and existing libraries (`std-env`, `@vercel/detect-agent`) cover different subsets of agents, apply different precedence, and would re-introduce drift the moment we add a new agent. Implementation is ~80 lines with zero dependencies and matches the Go/Java/Python implementations. ## What changed ### Interface changes - **`packages/core/src/clientinfo/agent.ts`** (new) - Exports `agentProvider()` (cached for the process lifetime) and `lookupAgentProvider()` (uncached, primarily for tests). `clearAgentCache()` is exported from the module file (not the barrel) for tests only, matching the pattern documented in `.agent/rules/testing.mdc` for intentionally-unbarreled symbols. - **`packages/core/src/clientinfo/index.ts`** - Adds `agentProvider` to the public barrel. ### Behavioral changes - `createDefault()` now appends at most **one** `agent/<name>` segment instead of one per matching env var. When two explicit matchers fire simultaneously (ambiguity), no `agent/` segment is emitted. - `AGENT=<name>` is now honored as a fallback. When no explicit env var matches, `AGENT=<known-product>` maps to that product, any other non-empty `AGENT` value maps to `agent/unknown`, and an empty or unset `AGENT` emits nothing. - Explicit env vars always win over `AGENT=<name>` (e.g. `CLAUDECODE=1` + `AGENT=goose` reports `claude-code`). - Detection is cached for the process lifetime, matching Go's `sync.Once`, Java's volatile lazy init, and Python's `_agent_provider` sentinel. - Agent list grows from 9 to 15: adds amp, augment, copilot-vscode, goose, kiro, windsurf. Existing nine agents continue to work. ### Internal changes - The inlined `KNOWN_AGENTS` list and `detectAgents()` function in `packages/core/src/clientinfo/default.ts` move to the new module. - The existing `default.test.ts` test case `multiple agents all reported` is replaced by `multiple agents are ambiguous and omit the agent segment` to reflect the new ambiguity semantics. Two new cases cover the `AGENT` fallback path. Adds `clearAgentCache()` calls in `beforeEach`/`afterEach` since detection is now cached. - `packages/core/vitest.config.browser.ts` excludes `tests/clientinfo/agent.test.ts` for the same reason `default.test.ts` is excluded: agent detection reads `process.env` and is Node-only. ## How is this tested? - New `packages/core/tests/clientinfo/agent.test.ts` mirrors the Go test cases from `useragent/agent_test.go`: every agent detected via its primary env var, empty-string env values counting as set, ambiguity when two explicit matchers fire, `AGENT` fallback for known and unknown values, explicit env vars winning over `AGENT=<name>`, the pinned `COPILOT_CLI` + `COPILOT_MODEL` ambiguity case for Copilot CLI BYOK users, and cache persistence after env changes. - `npm run format:check`, `npm run lint`, `npm run typecheck`, `npm test`, and `npm run test:browser` all pass. Core package runs 240 unit tests (29 new) and 150 browser tests. --------- Signed-off-by: simon <simon.faltum@databricks.com>

…ection-expand Signed-off-by: simon <simon.faltum@databricks.com> # Conflicts: # NEXT_CHANGELOG.md

Merging main bumped the project version to 0.104.0 but left the committed lockfile.json files pinned to 0.103.0, failing the maven-lockfile validation step. Regenerate both lockfiles under JDK 11 (matching the CI configuration) and run fix-lockfile to rewrite JFrog proxy URLs back to Maven Central. Co-authored-by: Isaac Signed-off-by: simon <simon.faltum@databricks.com>

github-actions · 2026-04-20T20:49:27Z

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-java

Inputs:

PR number: 768
Commit SHA: f7db288e86df882b2c27ace7274ec7bb3aac55e4

Checks will be approved automatically on success.

simonfaltum temporarily deployed to test-trigger-is April 20, 2026 08:37 — with GitHub Actions Inactive

simonfaltum had a problem deploying to test-trigger-is April 20, 2026 08:38 — with GitHub Actions Failure

simonfaltum added 3 commits April 20, 2026 11:07

Simplify agent detection data model

3e12cec

Signed-off-by: simon <simon.faltum@databricks.com>

simonfaltum mentioned this pull request Apr 20, 2026

Add AI agent detection to user-agent header databricks/sdk-js#88

Merged

simonfaltum requested a review from mihaimitrea-db April 20, 2026 09:55

mihaimitrea-db approved these changes Apr 20, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into simonfaltum/agent-det…

d7935e2

…ection-expand Signed-off-by: simon <simon.faltum@databricks.com> # Conflicts: # NEXT_CHANGELOG.md

simonfaltum temporarily deployed to test-trigger-is April 20, 2026 13:26 — with GitHub Actions Inactive

simonfaltum had a problem deploying to test-trigger-is April 20, 2026 13:27 — with GitHub Actions Failure

simonfaltum temporarily deployed to test-trigger-is April 20, 2026 14:35 — with GitHub Actions Inactive

Merge branch 'main' into simonfaltum/agent-detection-expand

f7db288

simonfaltum temporarily deployed to test-trigger-is April 20, 2026 20:47 — with GitHub Actions Inactive

simonfaltum enabled auto-merge April 20, 2026 20:48

simonfaltum temporarily deployed to test-trigger-is April 20, 2026 20:49 — with GitHub Actions Inactive

simonfaltum added this pull request to the merge queue Apr 20, 2026

Merged via the queue into main with commit 17f558c Apr 20, 2026
16 checks passed

simonfaltum deleted the simonfaltum/agent-detection-expand branch April 20, 2026 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand AI agent detection in user-agent#768

Expand AI agent detection in user-agent#768
simonfaltum merged 8 commits into
mainfrom
simonfaltum/agent-detection-expand

simonfaltum commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonfaltum commented Apr 20, 2026

Why

Changes

Test plan

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants