Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions docs/agent-code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Agent identification (agent_code & agentId)

dws tags every MCP request with **which agent host is driving it** and a
**per-instance id**, so usage can be sliced by channel/instance in the data
warehouse. This page is the integration contract.

## What dws sends on the wire

| Header | Meaning | Granularity |
|--------|---------|-------------|
| `x-dingtalk-dws-agent-code` | which agent host (claudecode / codex / qoder / cursor / custom …) | channel |
| `x-dws-agent-instance-id` | `dwsa_<base62>` derived from `machineId + agent_code` | machine × channel |
| `x-dws-agent-id` | stable per-install machine id (v1-compatible) | machine |
| `X-Cli-Version` | dws CLI version (segments old vs new clients) | — |

`x-dws-agent-id` keeps its original machine-level meaning for backward
compatibility; `x-dws-agent-instance-id` is the new per-channel value. Old
clients send no `agent_code` / instance id — treat their absence as
"legacy/unknown", not an error.

## How `agent_code` is resolved (confidence ladder)

1. **T0 — explicit declaration:** `DINGTALK_DWS_AGENTCODE=<code>`. **Use this.**
2. **T1 — verified env signature:** an agent that auto-sets a distinctive var
(`CLAUDECODE`, `CODEX_SANDBOX`, `OPENCLAW_BUNDLE_ROOT`, `HERMES_HOME`).
3. **T2 — `VSCODE_BRAND`:** every VS Code fork declares its brand — one rule
covers Cursor / Windsurf / Trae / Qoder / Kiro / … incl. future forks.
4. **T3 — macOS `__CFBundleIdentifier`:** known agent app bundles.
5. **T4 — `custom`:** unknown host. Never guessed.

## Declaring your agent (recommended — the only fully-general path)

Auto-detection cannot cover every agent: most terminal agents (gemini/
antigravity, aider, opencode, qwen-code, crush, goose, kimi, amazon-q,
continue, …) expose **no reliable self-identifying env var** — only user-set
API keys, which must not be used as identity. The robust answer is: **the host
sets `DINGTALK_DWS_AGENTCODE` in the env block where it launches dws as an MCP
server.** This is accurate for any agent, on any OS, and is future-proof.

MCP server config example (JSON-style hosts):
```jsonc
{
"mcpServers": {
"dingtalk-workspace": {
"command": "dws",
"args": ["mcp", "..."],
"env": { "DINGTALK_DWS_AGENTCODE": "your-agent-code" }
}
}
}
```

### Canonical codes

`claudecode`, `codex`, `cursor`, `vscode`, `qoder`, `windsurf`, `trae`,
`workbuddy`, `openclaw`, `hermes`, `codebuddy`, `comate`, `lingma`, `gemini`,
`aider`, `opencode`, `goose`, `crush`, `kimi`, `amazonq`, `continue`, …
Use a stable lowercase slug; unknown values are kept as-is (lowercased,
spaces stripped), so a new agent name flows through cleanly.

## Trust & limitations — READ THIS

**`agent_code` AND the ids (`x-dws-agent-id`, `x-dws-agent-instance-id`) are
self-reported, best-effort signals, NOT an authenticated identity.**

- `agent_code`: every declaration/auto-detect signal is an env var the
host/user controls — spoofable (`export CLAUDECODE=1` → dws reports
`claudecode`).
- The ids are **even easier to forge**: they are generated, stored, and sent
entirely client-side. `machineId` is a random UUID in the plaintext
`~/.dws/identity.json` (which the user owns), and the instance id is just
`sha256(machineId + agent_code)`. Editing that one file — or rewriting the
header — lets anyone mint, split, rotate, or impersonate ids at will. The
`dwsa_` prefix does NOT make it a secure identifier.

- ✅ **Fit for statistics / observability** (the intended use): there is no
incentive to misreport one's own agent, and real hosts emit real signals, so
aggregate per-channel metrics are reliable in practice.
- ❌ **NOT fit for authentication, authorization, rate-limiting, billing, or
revocation.** Anything where a party benefits from lying must not trust this
field. For control-plane use you need a gateway-issued **authoritative**
agentId bound to a verified credential (clientId / PAT / OAuth) — a separate,
heavier mechanism, deliberately out of scope here.

Treat `agent_code` / `x-dws-agent-instance-id` as analytics dimensions only.

## Gateway side (required for the data to land)

dws sending the headers is necessary but not sufficient. The gateway must:
1. add `x-dingtalk-dws-agent-code`, `x-dws-agent-instance-id`, `X-Cli-Version`
to the upstream-header pass-through allowlist (otherwise they are stripped);
2. log them as fields, and deliver them to the warehouse (alongside the
existing flow-control / execution logs).
23 changes: 22 additions & 1 deletion internal/app/runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -687,7 +687,28 @@ func resolveIdentityHeaders() map[string]string {
if sessionID == "" {
sessionID = os.Getenv(envRewindSessionID)
}
agentCode, _ := authpkg.AgentCodeFromEnv()
// Resolve the agent_code (accuracy-first; unknown hosts -> custom) and the
// per-(machine × agent_code) instance id. This is what makes agent_code
// actually report a value: previously it was sent only when the host
// injected DINGTALK_DWS_AGENTCODE (empty ~99.98% of the time), so the
// gateway logged no agent_code at all. DetectAgentCode always yields a code.
//
// Backward-compat by design (additive, not breaking):
// - x-dws-agent-id keeps its v1 meaning = machine-level install UUID
// (set by id.Headers() above), so old/new clients stay comparable.
// - x-dws-agent-instance-id is NEW: the per-(machine × agent_code) id.
// Old clients don't send it, which is itself a clean old/new signal.
// Note: x-dws-channel (DWS_CHANNEL) is a separate axis, untouched.
agentCode, agentCodeSig := authpkg.DetectAgentCode()
headers["x-dws-agent-instance-id"] = id.ResolveAgentID(defaultConfigDir(), agentCode, agentCodeSig)

// Emit the CLI version on the wire so the gateway can segment old vs new
// clients (and scope agent_code coverage / adoption). The header constant
// existed but was never set; wire it here.
if version != "" {
headers[transport.HeaderVersion] = version
}

envHeaders := map[string]string{
"x-dingtalk-agent": os.Getenv(envDingtalkAgent),
"x-dingtalk-dws-agent-code": agentCode,
Expand Down
52 changes: 50 additions & 2 deletions internal/app/runner_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -328,14 +328,62 @@ func TestResolveIdentityHeadersForwardsAgentCode(t *testing.T) {
}
}

func TestResolveIdentityHeadersAgentIdentityFields(t *testing.T) {
setupRuntimeCommandTest(t)
t.Setenv(authpkg.AgentCodeEnv, "qoder")

headers := resolveIdentityHeaders()

// x-dws-agent-id stays machine-level (v1 install UUID): non-empty and NOT
// the dwsa_ instance form — this is the cross-version continuity anchor.
machineID := headers["x-dws-agent-id"]
if machineID == "" {
t.Fatal("x-dws-agent-id must stay populated (machine-level)")
}
if strings.HasPrefix(machineID, "dwsa_") {
t.Fatalf("x-dws-agent-id must remain machine-level, got instance form %q", machineID)
}

// x-dws-agent-instance-id is the NEW per-(machine × agent_code) id.
instID := headers["x-dws-agent-instance-id"]
if !strings.HasPrefix(instID, "dwsa_") {
t.Fatalf("x-dws-agent-instance-id must be a derived instance id, got %q", instID)
}
if instID == machineID {
t.Fatal("instance id must differ from machine id")
}

// CLI version must now be on the wire so the gateway can segment old/new.
if headers[transport.HeaderVersion] == "" {
t.Fatalf("%s must be emitted", transport.HeaderVersion)
}
}

func TestResolveIdentityHeadersIgnoresReversedAgentCodeEnv(t *testing.T) {
setupRuntimeCommandTest(t)
t.Setenv(authpkg.AgentCodeEnv, "")
t.Setenv("DWS_DINGTALK_AGENTCODE", " compat ")
// Isolate from ambient agent-host detection signals so this test asserts
// only the reversed-env-name behavior (the suite itself may run under
// Claude Code / Qoder / VS Code, whose signals would otherwise be detected).
for _, k := range []string{
"CLAUDECODE", "CLAUDE_CODE_ENTRYPOINT",
"OPENCLAW_BUNDLE_ROOT", "OPENCLAW_RUNTIME_ROLE", "HERMES_HOME",
"CODEX_SANDBOX", "VSCODE_BRAND", "__CFBundleIdentifier",
} {
t.Setenv(k, "")
}

headers := resolveIdentityHeaders()
if got := headers["x-dingtalk-dws-agent-code"]; got != "" {
t.Fatalf("x-dingtalk-dws-agent-code = %q, want empty because reversed env is ignored", got)
// The reversed env name must never be consumed. With no canonical
// declaration and no host signature, agent_code resolves to the honest
// "custom" fallback — and crucially is NOT the reversed value.
got := headers["x-dingtalk-dws-agent-code"]
if got == "compat" {
t.Fatalf("x-dingtalk-dws-agent-code = %q, reversed env must be ignored", got)
}
if got != authpkg.AgentCodeCustom {
t.Fatalf("x-dingtalk-dws-agent-code = %q, want %q (fallback)", got, authpkg.AgentCodeCustom)
}
}

Expand Down
161 changes: 161 additions & 0 deletions internal/auth/agent_code_detect.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
// Copyright 2026 Alibaba Group
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// agent_code_detect.go resolves the agent_code — which agent HOST is driving
// dws (claudecode / qoder / cursor / vscode / openclaw / hermes / ...). It
// fills the x-dingtalk-dws-agent-code header for per-channel statistics.
//
// SEPARATE axis from DWS_CHANNEL / x-dws-channel (a distribution channel code);
// the two are never conflated here.
//
// Design contract — ACCURACY OVER COVERAGE, but maximize accurate coverage:
// - Prefer generalizable, host-declared signals so one rule covers a whole
// family (VSCODE_BRAND covers every VS Code fork, present and future).
// - Every per-host signature below is OBSERVED on a real host (live process
// env via `ps eww`, or the app bundle Info.plist), not guessed.
// - Anything unidentified falls back to AgentCodeCustom — never guess.
// - Deliberately NOT used: TERM_PROGRAM (reports the terminal, e.g. iTerm,
// not the agent host) and fuzzy parent-process name matching.
package auth

import (
"os"
"strings"
)

// AgentCodeCustom is the honest fallback for any host we cannot identify.
const AgentCodeCustom = "custom"

// hostSignature is a verified env fingerprint for a known agent host. EnvKeys
// match when any listed key is present and non-empty.
type hostSignature struct {
Code string
EnvKeys []string
}

// knownSignatures: CLI / daemon agents that inject a distinctive env var, which
// the dws subprocess they spawn inherits. All verified on a real machine
// (2026-06-16) via live process env / launch env — not guessed.
var knownSignatures = []hostSignature{
// Claude Code — verified: CLAUDECODE=1, CLAUDE_CODE_ENTRYPOINT=cli.
{Code: "claudecode", EnvKeys: []string{"CLAUDECODE", "CLAUDE_CODE_ENTRYPOINT"}},
// OpenClaw — verified on the running daemon: OPENCLAW_BUNDLE_ROOT.
{Code: "openclaw", EnvKeys: []string{"OPENCLAW_BUNDLE_ROOT", "OPENCLAW_RUNTIME_ROLE"}},
// Hermes — verified on the running gateway: HERMES_HOME.
{Code: "hermes", EnvKeys: []string{"HERMES_HOME"}},
// OpenAI Codex — CODEX_SANDBOX is auto-set by Codex for the subprocesses it
// spawns (e.g. CODEX_SANDBOX=seatbelt on macOS), and Codex filters this
// CODEX_-prefixed name out of user .env to prevent spoofing — so its
// presence reliably means "running under Codex".
// Source: developers.openai.com/codex/concepts/sandboxing
{Code: "codex", EnvKeys: []string{"CODEX_SANDBOX"}},
}

// NOTE on coverage limits (honest, not a TODO to silently ignore):
// Most terminal agents (gemini-cli/antigravity, aider, opencode, qwen-code,
// crush, goose, kimi, amazon-q, continue, ...) expose NO reliable
// self-identifying env marker — only user-set API-key/config vars, which we
// must not key off (a user setting GEMINI_API_KEY is not "running under
// gemini"). They therefore resolve to custom unless they declare themselves.
//
// The authoritative, fully-general path to 100% coverage is the T0 declaration
// contract: a host sets DINGTALK_DWS_AGENTCODE=<code> when it launches dws.
// That is accurate for ANY agent (present or future) on ANY OS, and is what an
// integrating host should wire up. Auto-detection (signatures / VSCODE_BRAND /
// bundle id) is a best-effort supplement for hosts that have not declared.

// bundleIDToCode maps macOS app bundle identifiers to agent codes. The bundle
// id is exposed via __CFBundleIdentifier and inherited by child processes the
// IDE spawns (including dws), so it identifies the host even from an integrated
// terminal. Verified from each app's Info.plist (2026-06-16). Only known agent
// bundles map; everything else (iTerm, Terminal, ...) falls through to custom.
//
// macOS-only signal: __CFBundleIdentifier does not exist on Linux/Windows, so
// this map is simply a no-op there (os.Getenv returns "").
var bundleIDToCode = map[string]string{
"com.qoder.ide": "qoder",
"com.todesktop.230313mzl4w4u92": "cursor", // Cursor's ToDesktop bundle id
"com.microsoft.VSCode": "vscode",
"com.workbuddy.workbuddy": "workbuddy",
}

// DetectAgentCode resolves the agent_code via a confidence ladder and returns
// the normalized code plus the signal that decided it:
//
// T0 explicit host declaration (DINGTALK_DWS_AGENTCODE — dedicated field)
// T1 verified per-agent env signature (CLI/daemon agents)
// T2 VSCODE_BRAND value (every VS Code fork declares its brand)
// T3 macOS app bundle id (known agent bundles only)
// T4 fallback -> custom (never guess)
func DetectAgentCode() (code string, signal string) {
// T0: host explicitly declares its agent_code — highest confidence.
if v, name := AgentCodeFromEnv(); v != "" {
return normalizeAgentCode(v), "env:" + name
}

// T1: verified per-agent env signature (most specific — wins over the IDE
// it may be running inside).
for _, sig := range knownSignatures {
for _, k := range sig.EnvKeys {
if strings.TrimSpace(os.Getenv(k)) != "" {
return sig.Code, "sig:" + k
}
}
}

// T2: VS Code fork family. The brand value IS the host's self-declaration,
// so this single rule covers Qoder/Cursor/VS Code/Windsurf/Trae/Kiro/... —
// including forks that don't exist yet.
if b := strings.TrimSpace(os.Getenv("VSCODE_BRAND")); b != "" {
return normalizeAgentCode(b), "env:VSCODE_BRAND"
}

// T3: macOS app bundle id (known agent bundles only).
if id := strings.TrimSpace(os.Getenv("__CFBundleIdentifier")); id != "" {
if c, ok := bundleIDToCode[id]; ok {
return c, "bundle:" + id
}
}

// T4: unknown host — honest fallback, no guessing.
return AgentCodeCustom, "fallback"
}

// normalizeAgentCode maps host-declared names/brands to canonical agent_code
// values. Unrecognized but non-empty input is lowercased, space-stripped and
// kept as-is — still a host declaration, so still accurate (this is what gives
// automatic coverage of new VS Code forks via VSCODE_BRAND).
func normalizeAgentCode(raw string) string {
s := strings.ToLower(strings.TrimSpace(raw))
s = strings.ReplaceAll(s, " ", "")
switch s {
case "":
return AgentCodeCustom
case "claude", "claude-code", "claude_code", "claudecode":
return "claudecode"
case "qoder", "qoderwork":
return "qoder"
case "workbuddy", "work-buddy":
return "workbuddy"
case "visualstudiocode", "code", "code-oss", "vscode":
return "vscode"
case "cursor":
return "cursor"
case "windsurf":
return "windsurf"
case "trae", "traecn":
return "trae"
default:
return s
}
}
Loading
Loading