Skip to content

add file upload support for agent debug mode#3300

Closed
jeffwu-1999 wants to merge 22 commits into
mainfrom
wzf_add_support_for_uploading_files
Closed

add file upload support for agent debug mode#3300
jeffwu-1999 wants to merge 22 commits into
mainfrom
wzf_add_support_for_uploading_files

Conversation

@jeffwu-1999

Copy link
Copy Markdown
Contributor
  • Add file attachment upload/preview/remove UI in debug panel
  • Upload files to MinIO and pass minio_files in agent run params
  • Support file attachments in both debug and compare modes
  • Include attachment info in conversation history
  • Update data_process_service to return img_info alongside chunks
  • Make object_name/presigned_url optional in conversationService types

for example:
image

DongJiBao2001 and others added 16 commits June 18, 2026 10:20
* ✨Feat:add aidp search tool

* 🗑️ Remove: Delete the standalone AIDP mock server implementation from the project.

* 🐛Bugfix: Update AIDP API endpoint parameters and enhance error logging

* 🔧 Refactor: Implement autouse fixture for supabase mock to ensure structured attributes are preserved during test execution

* 🔧 Refactor: Enhance stubbing of file management service in tests to ensure compatibility with LLM model retrieval and configuration management

* 🐛 Fix stub for file_management_service: look up patched names from sys.modules

The previous stub captured `backend_file_management_module` (the stub itself)
in `_stub_get_llm_model`, so `@patch` decorators modifying
`sys.modules['backend.services.file_management_service']` were never visible.
This caused `TestGetLlmModel` tests to return an unpached MagicMock instead
of the expected mock_model_instance.

Two changes:
1. `_stub_get_llm_model` now looks up all dependencies from
   `sys.modules['backend.services.file_management_service']` so that runtime
   patches from `@patch(...)` decorators are respected.
2. The stub module provides MagicMock defaults for all attributes that
   `@patch` needs to call `get_original()` on (tenant_config_manager etc.).

* 🔧 Refactor: Update test_get_llm_model to improve patching and ensure consistent behavior across environments. Simplified test structure by directly patching `get_llm_model` and its dependencies, enhancing clarity and reliability of test cases.
* 🐛 Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field (#3246)

* Move non-shadcn ui component to other folder

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix inability to select agent from agent space to edit

* Bugfix: Display correct version info when viewing agent details

* Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field

* 补充sql (#3248)

* 补充sql

* 扩大limit限制

* 🐛 Bugfix: Fixed an issue where the MCP service failed to start in a Kubernetes container. (#3254)

[Specification Details]
1. Modify the pod naming logic to convert all non-compliant characters to -.
2. Modify test cases.

* 🐛 Bugfix: knowledge_base_search_tool called with TypeError: argument of type 'FieldInfo' is not iterable (#3259)

* 🐛 Bugfix: Fixed an issue where the one-click rename function failed after importing an agent. (#3258)

[Specification Details]
1. The frontend does not pass `agent_id` when calling the `regenerate_name` API.

* Bugfix: Exclude attachments from assistant when saving conversation history (#3261)

* Bump APP_VERSION from v2.2.0 to v2.2.1 (#3268)

The default setting for client-side self-validation is "False".

---------

Co-authored-by: xuyaqi <xuyaqist@gmail.com>
Co-authored-by: hhhhsc701 <56435672+hhhhsc701@users.noreply.github.com>
Co-authored-by: Xia Yichen <iamjasonxia@126.com>
* 🐛 Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field (#3246)

* Move non-shadcn ui component to other folder

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix inability to select agent from agent space to edit

* Bugfix: Display correct version info when viewing agent details

* Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field

* 补充sql (#3248)

* 补充sql

* 扩大limit限制

* 🐛 Bugfix: Fixed an issue where the MCP service failed to start in a Kubernetes container. (#3254)

[Specification Details]
1. Modify the pod naming logic to convert all non-compliant characters to -.
2. Modify test cases.

* 🐛 Bugfix: knowledge_base_search_tool called with TypeError: argument of type 'FieldInfo' is not iterable (#3259)

* 🐛 Bugfix: Fixed an issue where the one-click rename function failed after importing an agent. (#3258)

[Specification Details]
1. The frontend does not pass `agent_id` when calling the `regenerate_name` API.

* Bugfix: Exclude attachments from assistant when saving conversation history (#3261)

* Bump APP_VERSION from v2.2.0 to v2.2.1 (#3268)

The default setting for client-side self-validation is "False".

---------

Co-authored-by: xuyaqi <xuyaqist@gmail.com>
Co-authored-by: hhhhsc701 <56435672+hhhhsc701@users.noreply.github.com>
Co-authored-by: Xia Yichen <iamjasonxia@126.com>
* 111

* issue_solve

* testcase_fix

* test_fix

* Remove unrelated unstructured filename metadata change
…3285)

* fix: parallel unit test runner with file-level subprocess isolation

- Rewrite test/run_all_test.py as file-level parallel runner using
  ThreadPoolExecutor with configurable workers (NEXENT_PYTEST_WORKERS)
  and per-file timeout (NEXENT_PYTEST_FILE_TIMEOUT)
- Add pytest-xdist to backend test extras
- Fix test_mcp_service.py: clear proxy env vars (socks://) in fixture
  to prevent httpx.AsyncClient ValueError
- Fix test_remote_mcp_service.py: mock check_runtime_host_port_available
  to prevent port conflict in container enable test
- Fix test_openai_llm.py: reduce memory leak from repeated module imports
- Update CI workflow: default to parallel mode, add dispatch inputs for
  worker count and per-file timeout

Serial: 229/229 pass (7m7s). Parallel: 229/229 pass (1m1s, ~7x speedup).

* chore: remove unused pytest-xdist dependency

The parallel runner uses ThreadPoolExecutor with per-file subprocess
isolation, not pytest-xdist. The xdist package was added but never
used due to sys.modules mock conflicts during pytest collection.

---------

Co-authored-by: Jinglong Wang <wangjinglong8@huawei.com>
* Move non-shadcn ui component to other folder

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix inability to select agent from agent space to edit

* Bugfix: Display correct version info when viewing agent details

* Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field

* Refactor: update left navigation menu

* 删除快速配置页面

* 删除注释

* 更新i18n
* 🐛 Bugfix: Update HTTP client settings to increase timeout and disable SSL verification in aidp_service and aidp_search_tool (#3280)

* 🐛 Bugfix: Fix page show
…3209)

* fix: resolve skills not exposed to agents and LogLevel enum errors

- Fix LogLevel.WARNING AttributeError by replacing with LogLevel.ERROR
  (smolagents LogLevel enum only has OFF/ERROR/INFO/DEBUG, no WARNING)
  at core_agent.py lines 417 and 804

- Increase skills token budget from 1000 to 4000 in summary_config.py
  to accommodate the verbose 6-step skill usage process (~2500-3500 chars)
  that was being silently dropped by TokenBudgetStrategy

- Add skills sections to English prompt templates (manager + managed)
  mirroring the Chinese template structure with <available_skills> block
  and skill usage requirements section

- Add diagnostic logging in create_agent_info.py and core_agent.py to
  track skills count and component assembly for debugging

- Improve exception handling in _get_skills_for_template() with ERROR
  level logging and full stack trace for better observability

- Add comprehensive test suite (test_context_component_types.py) with
  38 tests covering component types, assembly validation, and semantic
  equivalence between Jinja2 templates and component assembly path

All 104 tests pass (38 backend + 66 SDK), zero regressions.

* fix: resolve dual ContextManager bug and enable context manager by default

- Add atomic replace_components() method to ContextManager to prevent
  race conditions when swapping components on conversation-level CM
- Fix run_agent.py to re-register components on surviving CM after
  overwrite (both MCP and non-MCP paths)
- Guard CM creation in nexent_agent.py with enabled check to avoid
  creating useless CM when context management is disabled
- Change enable_context_manager default from False to True
- Fix numbering consistency: tools and skills always show 1./3. prefix
- Fix indentation in manager_system_prompt_template_en.yaml (6→5 spaces)
- Add tests for replace_components() and component survival after overwrite

* fix: remove invalid time_str arg and deduplicate test helpers

Remove time_str keyword argument from 12 test calls that caused
TypeError since build_context_components() and
build_skeleton_header_component() do not accept this parameter.

Extract shared mock classes (_MockTool, _MockManagedAgent,
_MockExternalAgent) to module level and introduce _base_kwargs()
and _full_kwargs() helpers to eliminate duplicated blocks,
reducing SonarCloud duplication density below the quality gate.
* Doc: Add design for upgrading context management in nexent with 16 works to do.

* docs: complete context management production review

* feat(W1): add type skeleton for ModelCapacityResolver and tokenizer registry

Introduces the contract surface for W1 (Correct Model Token-Capacity
Configuration) so W2/W3 development can begin against stable types. No
runtime behaviour change — resolver/registry implementations land in the
follow-up PR.

New modules:
- sdk/nexent/core/models/capacity_resolver.py: CapabilityProfile and
  ModelCapacitySnapshot (Pydantic v2, frozen), typed ResolverError
  hierarchy, compute_fingerprint() implementing the SHA-256/canonical-JSON
  contract from W1 ADR Decision 3, RESOLVER_VERSION constant, and a
  resolve_capacity() stub.
- sdk/nexent/core/models/tokenizer_registry.py: TokenizerAdapter Protocol,
  empty REGISTRY, FallbackEstimator (char/4 heuristic that always returns
  counting_mode='estimated'), and resolve() function. Family-name
  validation pattern enforces the naming convention fixed in the ADR.
- backend/consts/capability_profiles.py: CATALOG with eight approved
  day-one entries (openai/gpt-4o, openai/gpt-4.1, dashscope/qwen-plus,
  qwen-turbo, glm-5.1, silicon DeepSeek-V4-Flash, Qwen3.6-27B,
  Kimi-K2.6) plus CATALOG_REVISION.

Design reference: doc/working/context-management-workstreams/
W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (locally hosted; team
sharing channel separate from this repo per doc/.gitignore policy).

Smoke-tested: fingerprint is deterministic and order-independent across
unknown_capabilities and field_sources; ModelCapacitySnapshot rejects
mutation; tokenizer resolve() falls back to estimated for unknown
families; resolve_capacity stub raises NotImplementedError; CATALOG
imports cleanly with all 8 entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(W1): add capacity columns to model_record_t (additive migration)

Adds seven nullable capacity fields to model_record_t so the
ModelCapacityResolver can read operator overrides per W1 ADR:
- context_window_tokens
- max_input_tokens
- max_output_tokens
- default_output_reserve_tokens
- tokenizer_family
- capacity_source
- capability_profile_version

All columns are nullable, no defaults that change semantics. Legacy
max_tokens is left untouched and continues to behave as a deprecated
output-cap alias until consumers migrate (separate follow-up).

Touchpoints:
- docker/sql/v2.2.0_0615_add_capacity_fields_to_model_record_t.sql: idempotent
  upgrade with ALTER TABLE ... ADD COLUMN IF NOT EXISTS + COMMENT ON COLUMN.
- docker/init.sql: fresh-install CREATE TABLE inline plus COMMENT ON COLUMN.
- k8s/helm/nexent/charts/nexent-common/files/init.sql: same for k8s deploys.
- backend/database/db_models.py: ModelRecord ORM columns.
- backend/consts/model.py: ModelRequest Pydantic schema fields so CRUD
  round-trips the new values.

Design reference: doc/working/context-management-workstreams/
W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (Decision 1, schema).

Verification:
- ORM exposes all 7 columns
- Pydantic ModelRequest exposes all 7 fields
- All three SQL files contain 14 occurrences (column + COMMENT per field)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: move W1 ADR to dedicated ADRs directory

Move W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from context-management-workstreams to context-management-workstream/ADRs for better organization.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* feat(W1): implement resolve_capacity with catalog + operator override

Replaces the resolve_capacity NotImplementedError stub with the real
ModelCapacityResolver per W1 ADR. The resolver:

- Looks up the (provider, model_name) entry in the capability profile
  catalog passed by the caller.
- Merges operator overrides over the profile (operator wins).
- Validates that hard capacity is known and not impossible (output cap
  cannot exceed combined window; capacities must be positive).
- Defaults requested_output_tokens to the profile's
  default_output_reserve_tokens; rejects requests that exceed
  max_output_tokens.
- Derives provider_input_limit_tokens as min(max_input_tokens,
  context_window_tokens - requested_output_tokens) using only the limits
  that are defined.
- Asks tokenizer_registry for (adapter, counting_mode); records
  capability gaps in unknown_capabilities.
- Computes the deterministic SHA-256/canonical-JSON fingerprint from the
  resolved contract and builds an immutable ModelCapacitySnapshot.

The resolver stays pure: the SDK never reads DB or env; backend callers
supply the capability_profiles dict and operator_overrides. This matches
CLAUDE.md's SDK layer rules.

Typed failures raised on invalid input:
- ProviderCapabilityUnknown (no hard capacity)
- InvalidCapacityConfiguration (non-positive values, output > window,
  derived input limit non-positive)
- RequestedOutputExceedsCap (request above max_output_tokens)

Tests (15, all passing):
- Catalog lookup + override precedence
- Uncataloged with operator-supplied capacity
- Rejection: missing capacity, impossible values, negative values,
  requested-output overflow
- Default requested_output behavior
- Separate-input-limit path (synthetic, no day-one model uses it)
- Combined window + separate input limit takes minimum
- Snapshot immutability (Pydantic ValidationError on mutation)
- Fingerprint determinism and sensitivity to request changes
- Tokenizer estimated-mode flag appears in unknown_capabilities

Design reference: doc/working/context-management-workstreams/
W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(W1 step 4): extend SDK ModelConfig with capacity fields, rename LLM output cap

ModelConfig (sdk/nexent/core/agents/agent_model.py):
- Add max_output_tokens as the preferred name per W1 ADR.
- Keep max_tokens as a deprecated alias; a model_validator backfills the
  unset side so old and new callers both work during migration.
- Add the remaining capacity-snapshot fields so a ModelConfig can carry
  the resolved values from backend service down to the SDK: context_window_tokens,
  max_input_tokens, default_output_reserve_tokens, tokenizer_family,
  capacity_source, capability_profile_version.

OpenAIModel (sdk/nexent/core/models/openai_llm.py):
- Accept max_output_tokens (preferred) and max_tokens (deprecated). If only
  the legacy name is passed, log a debug and remap to max_output_tokens.
- Internal attribute renamed to self.max_output_tokens; self.max_tokens is
  kept as an alias for any reader.
- chat.completions.create still receives wire field max_tokens; only the
  internal name changed.

NexentAgent.create_model (sdk/nexent/core/agents/nexent_agent.py):
- Construct OpenAIModel with max_output_tokens=model_config.max_output_tokens
  so the new name flows through end-to-end.

Backward compatibility:
- Existing callers that set ModelConfig.max_tokens see no behavior change
  (validator copies it into max_output_tokens; the wire payload is identical).
- Existing callers reading OpenAIModel.max_tokens see no behavior change
  (alias attribute returns the same value).

Verified by table-driven smoke test of all four (max_tokens, max_output_tokens)
combinations on ModelConfig.

Design reference: doc/working/context-management-workstreams/W1_*.md and
W1 ADR. Provider adapters (step 3) and create_agent_info (step 6) follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(W1 step 6): wire ModelCapacityResolver in create_agent_info, drop legacy max_tokens

Replaces the long-standing bug where `model_info['max_tokens']` (a deprecated
output cap, semantically wrong) was assigned to ContextManagerConfig.token_threshold
(an input/context budget). The fix wires ModelCapacityResolver into the
runtime path so the context manager receives a real input budget derived from
the capacity snapshot.

Changes in backend/agents/create_agent_info.py:

- Add _resolve_input_budget(model_info): pulls operator overrides from the
  new model_record_t capacity columns, calls resolve_capacity(...) with the
  CATALOG from backend.consts.capability_profiles, and returns
  snapshot.provider_input_limit_tokens.
- On ProviderCapabilityUnknown (uncataloged model with no operator-supplied
  hard capacity), falls back to a safe constant _TOKEN_THRESHOLD_LEGACY_FALLBACK
  (8192) so the migration window doesn't break existing setups. Logged
  prominently so admins know to backfill.
- create_agent_config: stops reading model_info['max_tokens'] and passes
  the resolved input_budget into ContextManagerConfig.token_threshold.
- create_model_config_list: passes all seven new capacity columns
  (context_window_tokens, max_input_tokens, max_output_tokens,
  default_output_reserve_tokens, tokenizer_family, capacity_source,
  capability_profile_version) through to the SDK ModelConfig so end-to-end
  capacity flow works.

This is the end of the legacy max_tokens-as-context-threshold confusion.
ModelConfig.max_tokens stays as a deprecated alias per W1 step 4; this commit
removes its only known misuse from the runtime path.

The fallback constant is intentionally conservative — it kicks compression
early for unmigrated models so behavior degrades gracefully rather than
overflowing provider context. W2 will subtract its 10% uncertainty reserve
on top of the resolver's output once enforcement phase begins.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(loop-engineering): add comprehensive insight report on Loop Engineering methodology and recommendations for Nexent's evolution

* docs: add W1 ADR to ADRs directory

Restore W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from doc/context-management-upgrade branch to context-management-workstreams/ADRs directory.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* feat(W1 step 8): emit capacity snapshot fields in monitoring

Persist resolved model capacity snapshot metadata on model monitoring records so per-request telemetry can report total window, output reserve, safe input budget, source, tokenizer mode, unknown capabilities, and fingerprint.

- add nullable monitoring columns to ORM, fresh-install SQL, and idempotent upgrade migration
- bind resolved capacity snapshots from agent creation into SDK monitoring context
- enrich LLM, client-level, and record_model_call monitoring rows with snapshot fields
- cover enqueue and ORM payload behavior in SDK monitoring tests

Verification:
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/monitor/test_monitoring.py
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/core/models/test_capacity_resolver.py
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/agents/create_agent_info.py backend/database/db_models.py sdk/nexent/core/agents/agent_model.py sdk/nexent/core/agents/run_agent.py sdk/nexent/monitor/monitoring.py sdk/nexent/monitor/__init__.py

Co-Authored-By: Codex <codex@openai.com>

* feat(W1 step 3): surface provider-discovery capacity hints as candidates

Expose provider-supplied token-capacity metadata as advisory candidate fields in discovery responses without promoting them into persisted model records.

- add shared candidate extraction for common context, output, input, reserve, and tokenizer aliases
- wire SiliconFlow, DashScope, TokenPony, and ModelEngine adapters to attach provider_candidate hints when present
- keep prepare_model_dict from persisting provider_candidate fields automatically
- cover positive and no-hint paths for provider discovery

Verification:
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/backend/services/providers/test_silicon_provider.py /home/feiran/nexent/test/backend/services/providers/test_dashscope_provider.py /home/feiran/nexent/test/backend/services/providers/test_tokenpony_provider.py /home/feiran/nexent/test/backend/services/providers/test_modelengine_provider.py /home/feiran/nexent/test/backend/services/test_model_provider_service.py::test_prepare_model_dict_does_not_persist_provider_capacity_candidates
- env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/services/providers/base.py backend/services/providers/silicon_provider.py backend/services/providers/dashscope_provider.py backend/services/providers/tokenpony_provider.py backend/services/providers/modelengine_provider.py

Co-Authored-By: Codex <codex@openai.com>

* feat(W1 step 7): expose capacity fields in Add/Edit Model forms

Add explicit model-capacity controls to model management so operators can promote known capacity values through the existing model create and update flows.

- extend frontend model types and service request/response mappings for capacity fields
- add shared capacity form controls with tokenizer autocomplete, source badge, profile version text, and legacy max_tokens warning
- wire capacity validation and operator payloads into Add/Edit Model dialogs
- localize labels, tooltips, source names, and validation messages in en/zh

Verification:
- npm run type-check
- node -e "const fs=require('fs'); for (const f of ['frontend/public/locales/en/common.json','frontend/public/locales/zh/common.json']) { JSON.parse(fs.readFileSync(f,'utf8').replace(/^\uFEFF/,'')); } console.log('locale json ok')"

Co-Authored-By: Codex <codex@openai.com>

* docs: review 5 findings (CM-017, CM-018, CM-021, CM-024, CM-025)

Review and accept decisions for 5 findings:
- CM-018: structural validation blocks commit, semantic quality routes to W15 SLO
- CM-021: source lineage + mandatory presence validation blocks, semantic coverage to W15
- CM-024: use claim-scoped production readiness terminology
- CM-017: finite initial conflict set with explicit unresolved failure
- CM-025: subagent as independent agent with parent_session_id, async tool delegation, no recursion

Updated: finding-review-decisions.md, findings-registry.md (20/26 complete),
W4, W6, W10, W11, W12, W13, parent plan.
Added: pending-findings-decision-sheet.md for decision tracking.

Remaining 6 findings (CM-009, CM-010, CM-014, CM-015, CM-022, CM-026)
pending individual discussion.

* docs: accept CM-026 decision — exclude unsupported modalities from Release 1 gates

Remove multimodal testing from Release 1 SLO gates. W15 covers text modality
only; add modality contracts when specific product requirements emerge.

Updated: finding-review-decisions.md, findings-registry.md (21/26 complete),
W15, W3, pending-findings-decision-sheet.md.

* docs: retire W7, merge checkpoints into W5 as compression.snapshot events

Architectural simplification: checkpoints are no longer an independent
subsystem (W7). Compression results are stored as compression.snapshot
events within the W5 execution event log. Recovery finds the latest
compression.snapshot event and replays subsequent events.

Eliminates:
- Independent checkpoint table and CAS concurrency control
- Redis checkpoint cache layer
- W8 checkpoint-specific validation
- CM-014 checkpoint schema migration (covered by CM-005)
- W7 publication outbox for cross-system consistency

Updated: W5 (compression.snapshot event type, recovery flow, dirty-state
flush), W6, W8, W9, W13, W14, W15, parent plan, README, review artifacts.
Deleted: W7_Durable_Multi_Worker_Context_State.md.
CM-014 marked N/A (22/26 findings complete).

* fix(W1): clarify optional capacity fields

* docs: accept CM-009 decision — defer workload envelopes until post-implementation measurement

Do not pre-define workload envelopes. After W1-W16 implementation, use W15
measurement infrastructure to collect real performance data and define
envelopes based on observed data. No production-scale claim until envelopes
are defined. Aligns with CM-004 (measure before optimizing) and CM-011
(evidence-based gates).

Progress: 23/26 findings complete.

* docs: accept CM-010 decision — defer numeric targets until post-implementation measurement

Do not pre-define numeric availability, RPO, RTO, rebuild time, queue lag,
or storage capacity targets. After W1-W16 implementation, use W15
measurement infrastructure to collect real recovery/availability data per
topology and define targets based on observed data. No production-scale
claim until targets are defined. Aligns with CM-009 (measure before
defining envelopes) and CM-011 (evidence-based gates).

Progress: 24/26 findings complete.

* docs: accept CM-015 decision — remove content hashing, use O(1) metadata validation

W7 retirement eliminates the primary O(history) hashing consumer. Replace
content hashing with metadata-based validation at three points:
1. compression.snapshot: partial_after_erasure + version fields
2. W6 materialized cache: snapshot validity + event count + version fields
3. Physical erasure: one-time partial_after_erasure flag

No Merkle trees or segmented hashing needed. Storage-layer integrity handled
by database checksums, not W8.

Progress: 25/26 findings complete.

* fix(web): bind production server to all interfaces

* docs: accept CM-022 decision — consolidate decision traces into unified OpenTelemetry spec

Consolidate all decision trace requirements (W5, W6, W10, W15) into a single
unified telemetry/observability specification (low priority, post-core).
Use OpenTelemetry-style spans/attributes/events collected by external
observability infrastructure, not product-internal persistence.

Updated: W15 (replace decision trace persistence with OTel output),
parent plan (replace decision trace references with unified telemetry spec),
finding-review-decisions.md, findings-registry.md (26/26 complete),
pending-findings-decision-sheet.md.

All 26 findings now reviewed and decided.

* fix(W1 step 7): expose capacity fields in ProviderConfigEditDialog

Step 7 added capacity controls to ModelEditDialog (the OpenAI-API-Compatible
"custom model" edit path) but missed ProviderConfigEditDialog, the dialog
opened by the per-model gear icon under provider-categorized sections
(SiliconFlow / DashScope / TokenPony / ModelEngine). For any model whose
model_factory matches a recognized provider — including the W1 catalog
keys 'dashscope' / 'silicon' / 'tokenpony' — that gear icon was the only
edit path, leaving operators no way to set context_window_tokens et al.

Changes:
- ProviderConfigEditDialog: accept optional initialCapacity and
  hideCapacityFields props; render ModelCapacityFields when supported;
  include capacity payload in onSave callback shape.
- modelService.updateBatchModel: accept and forward the 6 capacity
  fields (context_window_tokens, max_input_tokens, max_output_tokens,
  default_output_reserve_tokens, tokenizer_family, capacity_source) to
  the existing batch_update_models endpoint, which already pass-throughs
  arbitrary update_data per backend/services/model_management_service.py
  line 347.
- ModelDeleteDialog single-model gear path: pass current capacity values
  from selectedSingleModel as initialCapacity, and forward saved capacity
  fields into the updateBatchModel call.
- ModelDeleteDialog provider-level "Edit Config" path: pass
  hideCapacityFields={true} since handleProviderConfigSave applies
  settings batch-wise to all models from one provider and per-model
  capacity is not a batch concept.

No behavior change for callers that don't pass initialCapacity (backward
compatible). Verified with npm run type-check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: stabilize test_model_provider_service against dual-import sys.modules pollution

Two tests (test_get_models_llm_success, test_get_models_embedding_success)
failed intermittently when test_model_provider_service.py ran after
test_capacity_resolver.py or test_silicon_provider.py. Root cause:
silicon_provider is loaded under two distinct sys.modules keys —
`services.providers.silicon_provider` (the path production code uses) and
`backend.services.providers.silicon_provider` (the path some test files
use). Each binding gets its own `SILICON_GET_URL` attribute because
`silicon_provider.py` does `from consts.provider import SILICON_GET_URL`,
which copies the value into the importing module's namespace.

When both keys are present, mock.patch targeting only the `backend.` path
silently fails to override the value used by the production code path
that SiliconModelProvider.get_models executes.

Fix: introduce _patch_provider_module_constant context manager that
patches the named attribute on every loaded copy of the module. Apply to
all four SILICON_GET_URL mock.patch sites in this file.

Verification:
- 289 tests pass under the previously-failing combined order:
  test/sdk/core/models/test_capacity_resolver.py +
  test/sdk/monitor/test_monitoring.py +
  test/backend/services/providers/ +
  test/backend/services/test_model_provider_service.py

The helper is order-independent and safe even when one of the two sys.modules
paths is absent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(W1): record post-acceptance known limitations and open W17 for capacity-suggestion UX

W1 ADR additions:
- KL-1: catalog miss for default model_factory='OpenAI-API-Compatible'.
  Manual-add LLM rows skip the embedding-only _infer_model_factory path,
  fall through to ProviderCapabilityUnknown, and lose catalog values.
  Documented with the end-to-end workaround verified on 2026-06-15 for
  glm-5.1 (catalog hit confirmed via direct SQL UPDATE).
- KL-2: provider-level batch Edit Config dialog hides capacity controls
  because they are per-model. Per-model gear icon path exposes them
  (fix landed 2026-06-16).

New W17 workstream proposal:
- POST /api/v1/models/suggest-capacity endpoint and frontend wiring.
- Catalog fuzzy match + provider discovery, returns placeholders for the
  capacity form. Operator accepts → saved with capacity_source='operator'.
- Subsumes the LLM gap in _infer_model_factory by replacing it with a
  shared host-to-provider map.
- Phased rollout behind a feature flag, with SLO target of >=70% match
  rate on new manual-add LLM rows.

Workstream README updated to index W17 under Model Capacity and Request
Safety, with a dependency note linking to KL-1.

The ADR remains Accepted. KL-1/KL-2 are post-acceptance discoveries that
trigger the new workstream rather than reopen the ADR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: update W3 with dispatch path analysis and bypass elimination plan

Add current dispatch path analysis: 1 chokepoint (openai_llm.py:186),
9 trusted paths, 2 production bypasses (B1: llm_utils.py, B2:
conversation_management_service.py).

Split step 9 into sub-steps:
- 9a: Fix B1 (system prompt generation bypass)
- 9b: Fix B2 (title generation bypass)
- 9c: Credential isolation (architecture layer)

Add bypass files to repository touchpoints.
Add bypass elimination tests.

* docs(W17): integrate post-acceptance workstream into both production plans

Per classification decision (Option A): W17 sits in the existing "Model
Capacity and Request Safety" module — same owners as W1-W3 — but is marked
Medium / post-acceptance to distinguish it from the Blocker-level original
freeze. This avoids creating a new module table for a single workstream
while keeping the design-freeze boundary intact.

Both plans:
- §1.2 (en) / §1.1 (zh) per-workstream table: add W17 row labeled
  "Medium (post-acceptance)" / "中 (落地后增加)" linking to its spec.
- New §1.4 (en) / §1.3 (zh) "Post-Acceptance Additions" section: explain
  that W17 was opened after the 2026-06-12 design freeze, triggered by KL-1
  surfaced during the glm-5.1 end-to-end test. Document the KL- vs CM-
  finding prefix convention.
- §2.3.1 module section: add a full W17 entry after W3 with status, problem,
  solution, proof, acceptance criteria, and the "post-acceptance, unscheduled"
  schedule note.
- §3 Phase plan table: add a sixth row "Post-acceptance follow-ups" /
  "落地后增加" decoupled from Phase 0-5, with a clarifying paragraph that
  W17 and future KL-triggered work do not move the August 7 milestone.

Frozen design-phase documents are NOT modified to avoid rewriting history:
- context-management-weekly-design-summary-zh.md (2026-06-08 to 06-12 status)
- review/findings-registry.md (26 CM- findings closed)
- review/over-engineering-secondary-review.md ("no new unconditional
  workstream"; W17 is conditional on observed KL-1)
- All review/phase*-review.md per-W reviews
- W1_HANDOFF_remaining_steps_3_7_8.md (historical handoff, steps closed)

The over-engineering guardrail still applies: W17 is conditional on the
specific named limitation KL-1, not a new unconditional workstream.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(W1 step 7): unify max_tokens with capacity panel and migrate legacy on edit

Frontend UX corrections discovered during W1 end-to-end testing:

1. Add Model dialog (single model)

The standalone "Max Tokens *" field has the same semantic meaning as
max_output_tokens in the capacity panel (W1 step 4 makes them aliases on
the SDK side). Showing both is confusing and forced operators to type the
same number twice. For LLM/VLM types the legacy field is now removed:

- ModelCapacityFields gains a `formMode` prop. In 'add' mode the panel
  renders as a flat labelled section (no Collapse, no "empty hint"
  alert) and hides defaultOutputReserveTokens; required fields render a
  red asterisk and are enforced through validateCapacityForm.
- ModelAddDialog passes formMode='add' with
  requiredFields=['contextWindowTokens', 'maxInputTokens']. The legacy
  Max Tokens input renders only when supportsCapacityFields is false
  (voice/rerank types still use it).
- isFormValid drops isValidMaxTokens(form.maxTokens) when
  supportsCapacityFields is true; capacity validation is the source of
  truth.
- The connectivity-verify config now reads form.maxOutputTokens for
  LLM/VLM (with parseMaxTokens fallback) since the standalone field is
  gone.
- buildCapacityPayload mirrors maxOutputTokens into the deprecated
  maxTokens column so legacy readers that haven't been migrated yet
  still see the value, removing an implicit dependency on the SDK
  Pydantic alias firing on every backend code path.

2. Edit Model dialog yellow deprecation warning

The warning "max_tokens 已废弃,请使用 max_output_tokens" fired even
after the user typed a new max_output_tokens value, because the trigger
read model.maxTokens / model.maxOutputTokens props instead of the live
form state. capacityFormFromModel now auto-promotes a legacy
model.maxTokens value into the form's maxOutputTokens on load so the
operator sees the value pre-populated, and the warning condition adds a
"&& !form.maxOutputTokens" check so it disappears as soon as the form
has a value. Saving from there writes to the max_output_tokens column,
which permanently clears the warning next time the row is loaded.

Both invocations of ModelCapacityFields in ModelEditDialog
(ModelEditDialog and ProviderConfigEditDialog) got the same correction.
ProviderConfigInitialCapacity now exposes maxTokens so the helper can
auto-migrate from the per-model gear path too; ModelDeleteDialog
forwards selectedSingleModel.max_tokens.

Locale strings added:
- model.dialog.capacity.error.requiredMissing (en/zh)

Verified: npm run type-check passes; locale JSON parses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(W1 step 7): Add panel description gone; tokenizer shares row; Edit drops legacy max_tokens

Two more UX corrections from W1 end-to-end testing:

1. Add Model panel cosmetic

The "Optional Capacity Settings — used to override or confirm model
capacity; leaving it empty will not block adding the model" header text
sat above the capacity inputs in add mode but in 'add' mode the fields
are part of the required form, so the "optional" framing was misleading
and the body label/description duplicated info already on each input.
Drop the header block in add mode; render content directly.

Layout had four numeric inputs in a 2-column grid then a full-width
tokenizer field underneath. That made row 1 = (context, input), row 2 =
(output, ___), row 3 = tokenizer alone — an awkward orphan slot in row
2. In add mode the tokenizer now slots into the grid next to
maxOutputTokens (no defaultOutputReserveTokens shown here), giving two
tidy rows. Edit mode is unchanged: defaultOutputReserveTokens takes the
fourth slot and tokenizer renders full-width below.

2. Edit Custom Model still showed both max_output_tokens and max_tokens

Step 7 only stopped rendering the legacy maxTokens field in Add Dialog.
The Edit Dialog continued to render it alongside the capacity panel's
maxOutputTokens, defeating the merge the Add fix made. ModelEditDialog
now hides the standalone maxTokens field when supportsCapacityFields is
true, drops the corresponding isValidMaxTokens validation from
isFormValid, and falls back to form.maxOutputTokens for the
connectivity-probe maxTokens parameter (with parseMaxTokens(form.maxTokens)
fallback so any pre-existing legacy value still works).

Verified npm run type-check; locale untouched this commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: clarify W4 step 4 and step 6 implementation details

Step 4: Clarify that W4 verifies W5 schemas include identity columns
rather than adding them (W5 owns the schema definition).

Step 6: Keep deprecated APIs with deprecation notice for next version
removal, rather than immediate removal.

* fix(W1 step 7): required = context_window + max_output; drop Collapse; consistent across Add/Edit

Corrections after the previous round's UX review:

1. Required fields were wrong.

Previous commit required (contextWindowTokens, maxInputTokens). The
correct W1 requirement is (contextWindowTokens, maxOutputTokens) — the
two values that bound the request budget end-to-end. max_input_tokens
stays optional because almost no real provider exposes a distinct hard
input limit; the resolver falls back to context_window - requested_output
when it's null. Updated three call sites:

- ModelAddDialog: requiredFields and validateCapacityForm both
  ['contextWindowTokens', 'maxOutputTokens'].
- ModelEditDialog inner panel: same requiredFields + same validation set.
- ProviderConfigEditDialog inner panel: same.

2. Edit dialogs no longer Collapse the capacity panel.

With context_window and max_output now required for both add and edit,
hiding the inputs behind a Collapse hides the red asterisks until the
user clicks the title. ModelCapacityFields drops the Collapse entirely
and renders flat in both modes. The 'add' vs 'edit' formMode prop now
only differentiates whether default_output_reserve_tokens is shown (it
stays in edit, hidden in add) and where the tokenizer field sits
(beside max_output in add, full-width in edit).

3. Empty-state hint suppressed when requiredFields is non-empty.

The locale string `capacity.emptyHint` advised "you can fill these later",
which contradicts required asterisks. Hide it whenever any requiredFields
are passed; show only for the legacy advisory case.

Verified npm run type-check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: refine W5 implementation plan with sub-steps and clarifications

- Split step 1 into 3 ADR sub-steps (taxonomy/schema, ordering/idempotency, evolution)
- Split step 3 into 4 code path sub-steps (agent loop, tool execution, error/cancel, answer)
- Add 4-phase migration plan to step 7 (shadow, read switch, write switch, remove direct writes)
- Clarify new event-log database module responsibilities in Repository Touchpoints
- Add performance baseline test requirement

* docs(W17): close three self-review gaps before implementation

Applied the W1 retrospective checklist to W17 (which I wrote after the
retrospective and which still hit the same lessons). Three corrections:

1. Repository touchpoints missed sibling frontend components.

The original list named ModelAddDialog, ModelEditDialog, and
ModelCapacityFields but omitted ProviderConfigEditDialog (the per-model
gear icon dialog) and ModelDeleteDialog (the provider browser). Both
are valid model-add entry points and the suggestion logic must reach
them, or W17 reproduces W1 step 7's "only ModelEditDialog got the new
fields" miss.

2. Frontend implementation plan was 3 items hiding 7 concerns.

Expanded into 7 numbered items grouped by concern: service layer (4),
form state machine with suggested/operator distinction (5), debounce
trigger and no-match graceful fallback (6), match_explanation Alert
rendering (7), coverage of all three add paths including provider
browser (8), error-mode contract (9), and locale strings (10).

3. No operational dependencies section.

Added a table covering which containers need rebuilding (nexent-runtime
+ nexent-northbound + nexent-config + nexent-mcp for backend; nexent-web
for frontend; nexent-postgresql untouched), new env var
CAPACITY_SUGGESTION_ENABLED, optional per-tenant flag in tenant_config_t
for staged rollout, monitoring dashboards to add, rollout sequence
(staging → one internal tenant → paid → all), and rollback procedure
(env var off → no schema cleanup needed).

These three corrections come from the W1 spec review checklist that
this commit was the trigger to formalize.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(W2 review): formalize six-item checklist from W1 retrospective; apply to W2

Two new documents:

SPEC_REVIEW_CHECKLIST.md — the reusable artifact.
Codifies the W1 post-acceptance retrospective's six lessons as a
checklist with concrete sub-questions per item:

  1. User Journey — who sees what change end to end
  2. Frontend Step Decomposition — ≥3 sub-items covering state /
     visual / service / validation / migration / siblings
  3. End-to-End Demo Script in Acceptance — concrete, copy-pasteable,
     with negative path
  4. Operational Dependencies — containers / migrations / env vars /
     flags / runbook / monitoring
  5. Sibling Components Enumerated — every dialog / function / column /
     module-key sibling named or explicitly out of scope
  6. Reverse-Test "Can the user actually use this" — operator can know
     feature is active, can reach values from UI, can observe fallback

W2_REVIEW.md — applies the checklist to W2 + the four reader-surfaced
issues the user spotted independently:

  Item 1: User Journey — 🔴 missing Operator-Visible Effects section
  Item 2: Frontend Decomposition — 🔴 no decision on UI for
          soft_limit_ratio / per-agent override
  Item 3: End-to-End Demo — 🟡 abstract, demo script proposed
  Item 4: Operational Dependencies — 🟡 nothing-to-do but unstated
  Item 5: Sibling Components — 🔴 six current local-reserve sites in
          agent_context.py not enumerated; W2→compaction handoff missing
  Item 6: Reverse Test — 🟡 no operator-visible activity indicator

  Issue A: soft_limit_ratio default unspecified — recommend 0.8
  Issue B: requested_output_tokens override location undefined —
           per-agent (DB column + agent-edit UI) vs per-request (API
           body) are two distinct contracts buried in one sentence
  Issue C: W2 ↔ W13 compaction-model relationship undefined — each
           model call needs its own W1→W2 chain; W2 spec must say
           snapshots are per-model, not shared (same defect class
           as the W1 catalog problem)
  Issue D: Step 5 "consistent" semantics ambiguous — clarify it's the
           CM-013 trusted-dispatch enforcement contract, not a rename

Verdict: W2 spec is not Ready to Implement; 7 of 10 items need updates.
None invalidate the architecture — they are under-specifications that
would reproduce W1-style post-acceptance surprises if shipped to
implementation as-is.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(review): convert W2 post-acceptance review to CM-NNN format under review/

Removed W2_REVIEW.md from the workstreams folder — wrong location and
wrong format, did not follow the established phase2-w*-review.md
convention (concise per-W file + central findings-registry.md).

Re-published in the correct shape:

- review/findings-registry.md: added CM-027 through CM-030 with
  Severity / Delivery classification / Affected documents / Description /
  Minimum non-over-engineered response columns matching the existing 26
  design-phase entries. Severity Summary updated (was 4/10/7/5 = 26,
  now 4/12/9/5 = 30).

- review/phase6-w2-review.md: new file in the same concise format as
  phase2-w*-review.md. Phase 6 is defined here as the post-acceptance
  review track opened after the W1 retrospective, distinct from Phase 2
  (design-phase per-W reviews) — same numbering convention, different
  trigger.

The four findings translate the W1 retrospective lessons + user-surfaced
W2 issues into CM-style entries:

  CM-027 Medium — soft_limit_ratio default unspecified; min response
                  set default 0.8 with per-tenant override path.
  CM-028 Medium — per-agent vs per-request override are two contracts in
                  one sentence; min response specify both and decide W2 scope.
  CM-029 High   — per-model snapshot rule unstated; W13 compaction call
                  needs its own W1->W2 chain (same defect class as W1 KL-1).
  CM-030 High   — Step 5 "consistently" is the CM-013 trusted-dispatch
                  enforcement contract, not a rename; min response add
                  server-side assertion + negative test.

The W17 follow-up workstream's KL-1/KL-2 references in W1 ADR and the
production plans remain in the KL- namespace for now; migrating those to
CM- can happen in a separate consistency pass if desired.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: refine W6 with projection priority, ContextItem scope, and implementation clarifications

- Add projection implementation priority (Release 1 required/optional/deferred)
- Clarify which projections produce full ContextItem vs simple records
- Define 'zero semantic mismatch' criteria for chat shadow comparison
- Clarify W8 validation call pattern in Phase 3 step 3
- Add performance baseline test requirement in Phase 4
- Clarify backend projection registry responsibilities

* docs: update W8 to align with CM-015 decision (remove content hashing)

Replace content-based hashing with O(1) metadata-based validation:
- compression.snapshot: partial_after_erasure flag + version field comparison
- W6 materialized projections: snapshot validity + event count + version fields
- Physical erasure: one-time partial_after_erasure flag propagation

Updates:
- Validity Contract: remove content hash, add metadata validation inputs
- Implementation Plan step 2: replace streaming hashing with metadata validation
- Implementation Plan step 4: use DerivedStateValidator (not CheckpointValidator)
- Implementation Plan step 7: 'derived state' instead of 'checkpoint'
- Validation and Invalidation Delivery: remove canonical serialization/hash algorithm
- Add CM-015 finding reference

* docs: unify finding namespace (KL-* → CM-*), close 9 review decisions, fix W13 dep stale W7

Three coordinated cleanups in one commit:

1. KL-* → CM-* migration (consistency with established review namespace)

The KL- prefix was a one-off I introduced earlier to mark post-acceptance
findings as distinct from the 26 design-phase CM- findings. Per the
established review-folder convention (see review/findings-registry.md +
review/finding-review-decisions.md), all findings should share one CM-NNN
namespace regardless of when they were discovered. Renames:

  KL-1 → CM-031 (catalog miss for default model_factory)
  KL-2 → CM-032 (provider-level batch dialog cannot host per-model capacity)

Updated references in: W1 ADR (Known Limitations section, kept the
"formerly KL-1/KL-2" parenthetical as an audit trail), W17 spec,
context-management-production-plan.md and -zh.md (§1.4 / §1.3),
README workstream index W17 row, SPEC_REVIEW_CHECKLIST.md, and
review/phase6-w2-review.md.

Removed the "落地后局限使用 KL-N 前缀" explanation from both production
plans since the namespace is now unified.

2. CM-027 through CM-032 added to review/finding-review-decisions.md

Six new finding-decision sections written in the same format the team
established for CM-001 through CM-026: Decision / Approved minimum /
Rationale / Explicitly out of scope / Updated documents. Covers:

  CM-027 W2 soft_limit_ratio default = 0.8
  CM-028 requested_output_tokens override = per-agent column + per-request
         API field, two distinct contracts
  CM-029 Per-model snapshot rule for secondary model dispatch (W13)
  CM-030 W2 Step 5 = CM-013 trusted-dispatch enforcement, not rename
  CM-031 catalog miss for default model_factory (formerly KL-1)
  CM-032 provider-level batch dialog cannot host per-model capacity
         (formerly KL-2)

3. README W13 dependency W7 → W5

After the team's W7 retirement merge, README line 49 still listed
W13's dependencies as "W2, W3, W7". Updated to "W2, W3, W5" since
W7's checkpoint/snapshot responsibilities are now W5
compression.snapshot events.

4. findings-registry.md Severity Summary updated

Was 4/12/9/5 = 30 after merge. After adding CM-031 (Medium) and CM-032
(Low), now 4/12/10/6 = 32.

5. English production-plan W7 residuals checked

The four W7 mentions remaining in context-management-production-plan.md
(workstream-table row, w7 anchor, retired heading, retirement-context
bullet listing what is NOT being adopted from W7) are intentional
historical markers in the W7 retirement section and were left in place.

Net change: ~20 lines across 9 files, no code, no migration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: update W9 with terminology fixes, resolve_ambiguous_effect, and subagent conflict check

- Replace 'checkpoint' with 'compression.snapshot' throughout
- Add resolve_ambiguous_effect to implementation order (step 4)
- Add subagent conflict check: reject mutating lifecycle operations when
  parent session has pending subagent sessions, even after parent run's
  active_run_id is cleared (async subagent scenario)
- Add subagent conflict test
- Add subagent session query to repository touchpoints

* docs: refine W10 with deprecation notice, subagent policy independence, and performance tests

- Step 7: Mark bypass paths as deprecated (not immediate removal)
- Add Subagent Policy Independence section: subagents resolve their own
  W10 policy; parent policy governs subagent result integration
- Add performance baseline test requirement for policy resolution and
  context selection latency

* docs: refine W11 with subagent reducer independence and step 3 clarification

- Step 3: Clarify deterministic reducers (structured, pointer) generate on
  demand; semantic reducers (compressed) cache at creation/update since
  regeneration involves LLM calls
- Add Subagent Reducer Independence section: subagents use their own reducer
  chain; parent reducers do not apply to subagent internal context
- Add performance baseline tests to tests section (lower priority, after
  functional implementation is stable)

* docs: refine W12 with offload threshold clarification, subagent artifact isolation, and performance tests

- Step 6: Replace 'observation limits' with 'offload thresholds' — outputs
  exceeding threshold are stored as artifacts with pointers (full content
  preserved), not truncated. Context space decisions remain with W10/W3.
- Add Subagent Artifact Isolation section: subagent artifacts scoped to
  subagent session; parent cannot directly access subagent artifacts.
- Add performance baseline tests (lower priority, after functional
  implementation is stable).

* docs: update W13 with current state gap analysis and implementation refinements

- Add Current State and Gap Analysis section: maps current agent_context.py
  implementation against W13 requirements, identifies 21 gaps (16 critical)
  and 5 existing strengths
- Add Compression Trigger Conditions: W2 soft_limit_ratio as primary trigger,
  two-phase thresholds as implementation details
- Add Fallback Model Selection Strategy: primary → fallback → W11 hard
  reduction cascade
- Step 4: Add measurable progress criteria (compressed tokens < source tokens,
  reject with no_progress if not)
- Add Subagent Compression Independence section: subagent sessions use own
  CompactionPolicy independently
- Add performance baseline tests (lower priority, after functional
  implementation is stable)

* docs: refine W14 with deprecation notice, subagent governance, and performance tests

- Step 9: Mark raw/direct write paths as deprecated (not immediate removal)
- Add Subagent Governance section: subagent sessions apply W14 internally using
  their own agent configuration; subagent final answer is already governed
  output; parent W10 policy governs integration; W14 does not re-redact
  already-redacted content
- Add performance baseline tests for redaction latency and deletion
  propagation latency (lower priority, after functional implementation)

* docs: clarify W15 step 1 baseline timing and performance coordination

- Step 1: Clarify that baseline measurements should be established before
  W1-W14 implementation starts (required to quantify improvement)
- Required Deliverables: Add note that W15 coordinates performance baseline
  tests across W5, W6, W10, W11, W12, W13, and W14 (lower priority but
  W15 defines measurement standards and targets)

* docs: add W16 subagent cache optimization and performance baseline priority

- Add Subagent Cache Optimization section: subagent sessions apply W16
  independently using their own agent configuration; cache partition plan
  scoped to subagent session
- Add note that repeated-turn performance baseline tests are lower priority
  (after functional implementation is stable)

* docs: renumber W-IDs to match new development sequence

Renumbered all W-ID documents to follow the optimized development order:

Original → New mapping:
- W1 (Capacity Config) → W1 (unchanged)
- W2 (Safety Reserve) → W2 (unchanged)
- W4 (Tenant Isolation) → W3
- W5 (Event Log) → W4
- W6 (History Separation) → W5
- W8 (Cache Validation) → W6
- W9 (Lifecycle APIs) → W7
- W10 (Unified Policy) → W8
- W11 (Progressive Reduction) → W9
- W12 (Output Control) → W10
- W14 (Trust/Redaction) → W11
- W13 (Reliable Compaction) → W12
- W15 (Quality SLOs) → W13
- W16 (Cache-Aware Assembly) → W14
- W3 (Guaranteed Fit) → W15

This reordering ensures:
- No forward dependencies (each W-ID only depends on earlier W-IDs)
- W15 (Guaranteed Fit) comes after W14 (Cache-Aware Assembly) which it consumes
- W12 (Reliable Compaction) comes after W11 (Trust/Redaction) which it depends on
- W3 (Tenant Isolation) comes before W15 (Guaranteed Fit) which needs it

Updated all internal W-ID references across all documents.

* docs: update production plan with new W-ID order and phase structure

- Update Section 1.1: 16→15 workstreams, module table W-IDs
- Update Section 2.1.2: Checkpoint→Compression Snapshot terminology
- Update Section 2.2: Architecture diagram (Checkpoints→Compression Snapshots)
- Update Section 2.3: Workstream descriptions with all refinements
  - W15: Add dispatch bypass elimination (B1, B2)
  - W10: Clarify offload threshold vs truncation
  - W12: Add current state gap analysis reference
  - W14: Add subagent cache optimization
- Update Section 3.1: Phased delivery plan for new W-ID order
  - Phase 1: W1, W2, W3 (Foundation)
  - Phase 2: W4, W5, W6 (Event Infrastructure)
  - Phase 3: W7, W8, W9, W10, W11 (Lifecycle and Policy)
  - Phase 4: W12, W14 (Compaction and Assembly)
  - Phase 5: W13, W15 (Quality and Fit)
- Update Section 3.2: Gantt chart for new timeline
- Update Section 3.3: Dependency diagram for new order

* docs: fix all W-ID anchor links in production plan

Fixed 52 incorrect anchor links throughout the production plan document.
All [W\d+](#w\d+) links now correctly match the new W-ID numbering:
- W1-W15 links now point to correct anchors (#w1-#w15)
- Updated Section 0.1-0.3 comparison tables
- Updated Section 1.2 detailed improvement table
- Updated Section 2.3 memory control capabilities table
- Updated Section 2.4 ClawVM adoption table
- Updated Section 3.1 phase table

All anchor links now follow the pattern [Wn](#wn) where n matches.

* docs: revise W17 capacity suggestion spec

* docs: rewrite Chinese production plan with new W-ID numbering

- Translate updated English version (1296 lines → 1208 lines Chinese)
- Move from doc/working/ to doc/working/context-management-workstreams/
- Update all W-ID references to new numbering (W1-W15)
- W7 marked as retired (compression.snapshot merged into W4)
- New phase structure (5 phases with correct W-ID groupings)
- Professional terms kept in English where appropriate
- Mermaid diagrams preserved in English
- Old file deleted from previous location

* docs(W2): add ADR for budget snapshot overrides and dispatch enforcement

Add W2_ADR_Budget_Snapshot_Overrides_and_Dispatch_Enforcement.md defining:

- Override precedence: operator column > model default > resolver fallback
- Fingerprint algorithm: SHA-256 over W1 fingerprint + W2-specific fields
- DB column: ag_tenant_agent_t.requested_output_tokens nullable positive int
- SDK dispatch assertion: max_tokens must equal snapshot.requested_output_tokens

This ADR formalizes the contracts identified in CM-028, CM-029, CM-030 and
provides the design anchor for W2 implementation steps 3-5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(W2): absorb CM-027-CM-030 findings into spec and production plan

W2 spec updates:

- CM-027: soft_limit_ratio default 0.8, per-tenant override via tenant_config_t
- CM-028: two distinct override contracts (per-agent column + per-request API field)
- CM-029: snapshots are per-model; W13 must invoke W1→W2 chain for compaction model
- CM-030: CM-013 trusted-dispatch enforcement at provider call (assert max_tokens == snapshot.requested_output_tokens)

Production plan updates:
- Per-agent column and per-request API field documented
- soft_limit_ratio default and override path
- per-model snapshot chain for compaction (W13 dependency)
- dispatch assertion contract

All four findings from W2 post-acceptance review now integrated into the spec.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Add W2 capacity budget skeleton

* docs: remove retired W7 strikethrough row from Chinese production plan table

* Add W2 reserve policy configuration

* Implement W2 safe input budget calculator

* docs: add Chinese translations for all W-ID specification documents (W1-W17)

* Resolve W2 request safe input budget

* Apply W2 safe budgets to context manager

* Enforce W2 output tokens at dispatch

* Emit W2 budget snapshots to monitoring

* Surface W2 uncertainty reserve warning

* Verify W2 budget fingerprint at dispatch

* Verify W1 capacity identity at W2 dispatch

Defense-in-depth check per CM-013: the trusted dispatch boundary now
rejects a W2 safe-input-budget snapshot whose `w1_fingerprint`,
`provider`, or `model_name` disagrees with the active W1 capacity
snapshot threaded alongside it. This closes the model-swap mid-flight,
stale-cache, and cross-tenant snapshot-reuse failure modes that the
prior self-only fingerprint check would silently let through.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Backfill W2 capacity from W1 catalog for legacy deployments

W1 step 7 made context_window_tokens and max_output_tokens required at
the Add/Edit forms, but pre-existing model_record_t rows in production
deployments still have NULL capacity columns and silently disable W2's
CM-030 dispatch enforcement.

This migration auto-fills the eight W1 day-one catalog entries on rows
where (LOWER(model_factory), model_name) matches and capacity is still
NULL. It is idempotent (re-runs are no-ops) and ships as a regular
docker/sql migration so every downstream deployment picks it up on
upgrade.

Rows whose model_factory does not match a catalog provider key
(commonly the manual-add default 'OpenAI-API-Compatible' per CM-031)
are left untouched; the resolver fallback log is upgraded to WARNING
with an actionable remediation message so operators can identify
exactly which models still need attention before W17 ships.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: add codebase gap analysis, reorder priorities, mark deferred workstreams

- Add §1.5 Codebase Gap Analysis to both EN/ZH production plans
- Update §1.2 improvement table with Status column and new priority order
- Move W14 (prompt cache) to Phase 1: high value, zero dependencies
- Mark W5, W6(full), W8(full), W10(artifact), W11(full) as tentatively deferred
- Update Phase table, descriptions, Gantt chart, and dependency diagram
- Add gap analysis notes to W3, W4, W6, W8, W10, W11, W12, W14 docs
- Restructure README workstream index: Active / Deferred / Retired sections

* Make missing-capacity warning operator-friendly and dedup it

Two fixes to the WARNING surfaced when a model has no capacity
configured:

1. Drop internal design-doc jargon. The previous message mentioned
   CM-030, CM-013, and W17 — none of which are meaningful to an
   operator reading backend container logs. Replaced with plain
   English that names what is disabled (output token cap + budget
   consistency check) and the exact UI path to fix it.

2. Deduplicate per process per model_id. Without this, every agent
   run logged the same line, so a tenant with 1k daily messages on a
   bare model would emit 1k duplicate warnings per day and drown
   real signal. A module-level set tracks already-warned model_ids;
   the warning fires once per process per model and is cleared only
   on process restart.

Includes the ResolverError branch which previously had a separate
WARNING line — both branches now route through the same dedup helper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(W17): add visibility surfaces for existing bare-capacity models

W17's original scope was preventing new bare rows at add/edit time. It
did not address the complementary problem: rows that already exist in
a bare state silently disable W2 enforcement, and the only signal
today is a backend WARNING that the people who can fix it (model
administrators, agent authors) never see.

Adds a new "Visibility for Existing Bare-Capacity Models" section
specifying three UI touchpoints — model management list badge,
agent-edit selector warning, and an operator dashboard widget — backed
by a small read-only GET /api/v1/models/capacity-coverage endpoint.
The visibility work is phase-tagged as 1.5 so it can ship behind a
separate small flag without waiting for the connectivity-integration
and provider-discovery work in later phases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: renumber W-IDs by priority, rename deferred to P-IDs

Active workstreams renumbered by implementation priority:
  W1 (token capacity), W2 (output reserve) - unchanged
  W3 (prompt cache, was W14) - moved to Phase 1
  W4 (tenant isolation, was W3)
  W5 (event log, was W4)
  W6 (compaction reliability, was W12)
  W7 (lifecycle APIs) - unchanged
  W8 (progressive reduction, was W9)
  W9 (quality SLOs, was W13)
  W10 (guaranteed fit, was W15)
  W11 (capacity suggestion, was W17)

Deferred workstreams renamed W→P:
  P1 (history separation, was W5)
  P2 (cache validation, was W6)
  P3 (context policy, was W8)
  P4 (pollution control, was W10)
  P5 (trust/redaction, was W11)

58 files updated: spec files, translations, production plans,
README, ADR, review documents, weekly summary.

* Fix soft-delete column name in W2 catalog backfill migration

The migration filtered on a non-existent column `deleted_flag = 0`,
which never matched any row, so the backfill silently no-op'd on
every deployment. The model_record_t soft-delete column is
`delete_flag` (String(1), default 'N') per backend/database/db_models.py.

Verified on the local cluster: with the corrected filter, the migration
matched the one catalog-eligible row (glm-5.1 on dashscope) and
populated context_window_tokens=200000, max_output_tokens=131072.
Remaining bare rows on the cluster all carry
model_factory='OpenAI-API-Compatible' (CM-031), confirming W17 as
the remediation path for the default-factory population.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(W17): add bare-row production evidence and scope to LLM/VLM only

Two additions to the W17 'Visibility for Existing Bare-Capacity Models'
section:

1. Production evidence: a 2026-06-17 snapshot of model_record_t on a
   live dev cluster showed 6 of 7 non-deleted rows carrying the
   manual-add default model_factory ('OpenAI-API-Compatible'), and the
   W2 catalog backfill matched only 1 row — leaving the model the
   operator was actively chatting with (glm-5) bare. This grounds the
   workstream's motivation in a concrete observation rather than a
   projected concern.

2. Scope clarification: embedding, STT, and TTS rows share the same
   capacity columns but never traverse the W1/W2 path, so a NULL on
   those rows is not a missed enforcement. The badge, agent-edit
   selector notice, dashboard widget, and /capacity-coverage endpoint
   all apply a model_type IN ('llm', 'vlm') filter at the data layer
   to prevent noise on non-LLM rows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Raise legacy fallback threshold to 81920 and explain output reserve in UI

Two coordinated changes that both came out of W2 end-to-end validation
against a bare-capacity model (glm-5):

1. Bump the W1/W2 unknown-capacity fallback from 8192 to 81920 in both
   backend (_TOKEN_THRESHOLD_LEGACY_FALLBACK) and frontend
   (TokenUsageIndicator.DEFAULT_THRESHOLD). 8192 was so small that any
   non-trivial conversation triggered compression almost immediately,
   masking real usage signal. 81920 fits the input budget of any
   modern 32K+ LLM; if the actual model is smaller and bare, the
   provider returns a clear token-overflow error at request time
   rather than the system silently truncating. Both sides match so the
   indicator denominator and the backend compression trigger stay in
   sync when the snapshot path is not available.

2. Add a tooltip on the agent-edit "Output Reserve" form item so model
   admins and agent authors understand the field's physical meaning:
   it carves output space out of the context window, and the trade-off
   between longer replies versus more retained history is explicit.
   Tooltip strings live in both zh and en common.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Retune legacy capacity fallback from 81920 to 32768

After bumping the bare-capacity fallback up from 8192 to 81920 in
commit 689e3ec52, 81920 was on the optimistic side: it presumes most
unknown models can absorb ~80K tokens of input. Many production
deployments still rely on the 32K-context band (GPT-3.5 Turbo 16K,
GLM-4 32K, Qwen2 32K, Llama 3 32K, Mistral 32K, etc.), and an 80K
input on a 32K model produces a provider-side token-overflow rejection.

32768 is the conservative compromise: it covers the majority of
production LLMs without inviting overflow on the still-common 32K
class. Models with larger windows lose only a few extra compression
cycles, which is the correct cost direction (slightly more work over
silent overflow). Backend (_TOKEN_THRESHOLD_LEGACY_FALLBACK) and
frontend (TokenUsageIndicator.DEFAULT_THRESHOLD) stay in sync so the
indicator denominator matches the backend compression trigger when
the W2 snapshot path is unavailable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: add capacity values explainer covering W1/W2/W3 number flow

Single-file reference doc walking from UI-visible capacity columns
(context_window, max_output, default_reserve) through W1 resolver
output (provider_input_limit, fingerprint), W2 calculator output
(soft / hard input budget, uncertainty reserve), and the four-tier
override chain for requested_output_tokens (CM-028). Includes worked
examples for the standard configuration, agent-level override, the
RequestedOutputExceedsCap failure mode, and the bare-capacity
fallback path. Intended audience: model admins, agent authors, and
engineers reviewing W1/W2/W3 specs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Enforce output reserve ceiling at the agent-edit form

Closes the UX gap where 'Output Reserve' accepted values exceeding
the selected model's max_output_tokens. The capacity resolver caught
the violation only at agent run time, raising RequestedOutputExceedsCap
and failing the conversation with no surface signal to the agent author.

Three additions on AgentGenerateDetail:

- A conditional Form.Item rule that pins the field's max to the
  currently selected model's maxOutputTokens. The rule is omitted on
  bare-capacity models (maxOutputTokens undefined) where the resolver
  cannot enforce anything anyway.
- A matching `max` prop on the InputNumber so the stepper UI also
  blocks the value, not just the validator.
- A useEffect that re-runs validation on requestedOutputTokens
  whenever the selected model's maxOutputTokens changes, so switching
  from a 32K-output model down to an 8K-output one immediately
  surfaces the conflict rather than waiting until save.

New i18n key agent.requestedOutputTokens.maxError interpolates the
actual ceiling so the error message names the number.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Reject max_input_tokens > context_window_tokens on both ends

Closes the audit gap noticed alongside the W2 UX fix: an operator
fills max_input_tokens above context_window_tokens, the save succeeds,
and the override is silently clipped at runtime because the resolver
computes provider_input_limit = min(max_input, context_window -
requested_output). The administrator's value never takes effect and
no error or log surfaces.

Backend fix in capacity_resolver: raise InvalidCapacityConfiguration
with a message that names the silent-clipping mechanism so the
operator understands why the override was rejected. The check sits
right next to the sibling max_output_tokens > context_window check,
keeping all cross-field invariants in one place.

Frontend fix in validateCapacityForm: add the same cross-field check
with a matching i18n key (model.dialog.capacity.error.inputExceedsWindow,
zh + en). Surfaces inside the existing ModelEditDialog and
ModelAddDialog save flow that already wires validateCapacityForm.

Tests: two new cases on test_capacity_resolver — rejection of
max_input above the window, and acceptance of the equality boundary
(max_input == context_window is legal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Raise SDK requested_output_tokens fallback from 1024 to 4096

The four-tier override chain for requested_output_tokens ends with a
hard-coded SDK constant when neither the agent ('Output Reserve' field)
nor the model record (default_output_reserve_tokens column) provides a
value. The model-add UI does not render default_output_reserve_tokens
at all (only edit mode does), so newly added rows always carry NULL in
that column and most agents reach the SDK fallback at runtime.

1024 was too small in practice. Tool-using agents emit a few-hundred-
token JSON tool call plus a few hundred tokens of thought per step;
1024 frequently truncated the JSON mid-emission, which then surfaced
as a tool-call failure instead of a capacity-config issue. The W2
fingerprint chain stays green and the indicator denominator looks
healthy, but replies and tool calls get silently chopped.

4096 covers the median single-turn output for tool chains, short
reports, and modest code generation. Models with a smaller
max_output_tokens are still safe: the existing
RequestedOutputExceedsCap check at capacity_resolver.py:276-283 (and
the matching agent-edit Form.Item rule from the prior commit) catches
the violation explicitly rather than silently truncating.

No tests assumed 1024; the full test_capacity_resolver suite stays
green (17 passing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: refresh Capacity Values Explainer after UX gap fixes

Sync the explainer with the just-landed capacity changes so the doc
stops describing the older silent-failure behavior:

- Override chain (§3) now names the SDK fallback as 4096 (was 1024)
  and includes a short note o…
* Move non-shadcn ui component to other folder

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix inability to select agent from agent space to edit

* Bugfix: Display correct version info when viewing agent details

* Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field

* Refactor: update left navigation menu

* 删除快速配置页面

* 删除注释

* 更新i18n

* Bugfix: Fix i18n translation issues in navigation sidebar
* 🐛 Bugfix: Update HTTP client settings to increase timeout and disable SSL verification in aidp_service and aidp_search_tool (#3280)

* 🐛 Bugfix: Fix page show

* 🐛 Bugfix: Prevent saving null values in tool parameters across backend and frontend components. Ensure only defined values are used when merging and updating tool configurations.

* 🐛 Bugfix: Ensure `useSaveGuard` returns true upon successful save and update unit tests to reflect changes in return type for tool instance creation and update.
@jeffwu-1999 jeffwu-1999 force-pushed the wzf_add_support_for_uploading_files branch from f29d489 to ea1d235 Compare June 25, 2026 12:16
hhhhsc701 and others added 5 commits June 26, 2026 10:28
* Refactor prompt and skill assets

* Add unified uninstall entrypoints and image build selection

* Expand image build script with interactive selection

* Simplify image build defaults and remove deprecated deploy scripts

* Refactor prompt and agent infrastructure

* Make SQL migrations idempotent

* Ignore legacy env when config values are loaded

* Add secret rotation and Elasticsearch key refresh support

* Remove obsolete init SQL comments

* Update NEXENT_SQL_STARTUP_MODE to 'off' and enhance deployment scripts

* Add shared hostPath storage for workspace and skills

* Refactor image builds for variant-specific dependencies

* Refactor prompt handling and improve agent workflow

* fix: remove obsolete comment on skill configuration parameters in migration file

* fix: update offline package build process to create zip instead of tar.gz

---------

Co-authored-by: hhhhsc <name>
* Release/v2.2.1 (#3269)

* add_greeting_fields_to_agent-develop

* feat(knowledge-base): add preserve_source_file and post-index source cleanup

Let knowledge bases opt out of keeping uploaded MinIO copies after indexing
while retaining Elasticsearch chunks for retrieval. Default behavior remains
preserve_source_file=true for backward compatibility.

- Add preserve_source_file column (init.sql + v2.2.0_0601 migration)
- Accept preserve_source_file on create/update and northbound/vector APIs
- Support document DELETE scope=source_only and source_available in listings
- Run cleanup_source Celery task when preserve_source_file is false
- UI: create-KB toggle, list tag, knowledge-base preview when copy is missing
- Update vector-database SDK docs and backend tests

* test(data_process): stub knowledge_db, redis_service, and redis in test_worker

Align setup_mocks_for_worker with test_tasks so importing
backend.data_process.worker loads package __init__ without real DB/redis deps.

* test(data_process): shim cleanup_source for submit_process_forward_chain tests

* remove duplicate import

* fix: update unit tests for greeting_message and example_questions fields

* add init.sql to sonar.properites

* ♻️ Improvement: API to MCP conversion service supports configuring headers. (#3194)

* ♻️ Improvement: API to MCP conversion service supports configuring headers.
[Specification Details]
1. Front-end and back-end modifications

* ♻️ Improvement: API to MCP conversion service supports configuring headers.
[Specification Details]
1. Modify the frontend, after adding, set the HTTP headers to empty.
2. Modify test cases.

* ♻️ Improvement: Enhance processing of ES index names in memory banks. (#3196)

[Specification Details]
1. Replace all symbols in the index name that do not meet the rules with "_".
2. Modify test cases.

* feat: add active memory tools (StoreMemoryTool, SearchMemoryTool) (#3197)

- Implement StoreMemoryTool for explicit memory storage during agent reasoning
- Implement SearchMemoryTool for on-demand memory retrieval during conversations
- Integrate tools into agent creation flow (create_agent_info.py)
- Register tools in nexent_agent.py and tools/__init__.py
- Add MEMORY_OPERATION tool sign for proper categorization
- Fix memory_core.py cache key to include event loop ID (prevents cross-loop conflicts)
- Add comprehensive test coverage for both tools
- Add procedural memory verification documentation

Tools follow existing patterns: lazy imports, observer integration, error handling,
and respect user memory preferences (agent_share_option, disabled_agent_ids).

Co-authored-by: Dallas98 <40557804+Dallas98@users.noreply.github.com>

* 🐛 Bugfix: skill names and descriptions never load to context (#3205)

* 🐛 Bugfix: skill names and descriptions never load to context

* 🐛 Bugfix: skill names and descriptions never load to context

* 🐛 Bugfix: skill names and descriptions never load to context

* 🐛 Bugfix: official skills not copied to target directory

* 🐛 Bugfix: official skills not copied to target directory

* Feat: add selected count badges to tool/skill pool labels (#3206)

Co-authored-by: chase <byzhangxin11@126.com>

* 🐛 Bugfix: Fix attribution error when tool calling error (#3208)

* ✨ Feat: Add support for Word document generation, preview, and download (#3191)

* Feat: Add support for Word document generation, preview, and download

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Restrict uploads to a known safe workspace/output directory

* 修改单元测试

* 修复单元测试

* Bugfix: Store uploaded files in Minio for conversation messages to enable file visibility in history

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* ✨Feat:Enhance prompt optimization by integrating openjiuwen and fix related bugs (#3190)

* ✨Feat:add prompt optimization

* 🐛Bugfix: dockerbuild failed when running pipefail in python3_11

* 🔨Optimize: Optimize prompt optimization display page and interaction methods

* 🐛Bugfix: fix dependencies replication

* 🎨:Optimize frontend prompts and loading interface

* 🔧 Refactor: Update imports and remove redundant ENABLE_JIUWEN_SDK import in prompt_service.py

* 🔧 Refactor: Correct import path for NexentCapabilityError and enhance test coverage for prompt optimization service

* 🔧 Refactor: Update import paths for exception handling and improve logging formatting in prompt_service.py

* 🔧 Refactor: Simplify lazy imports in jiuwen_sdk_adapter.py and update import paths in prompt_service.py

* 🔧 Refactor: Enhance Jiuwen SDK adapter handling and improve test stubs in prompt_service.py and related test files

* 🧪test:Pydantic model for PromptTemplateRequest in test_prompt_template_app.py

* 🔧 Refactor: Remove unnecessary dependency exclusions from pyproject.toml

* 🔧 Update: Upgrade huggingface_hub dependency version in pyproject.toml

* 🔧 Update: Exclude unnecessary transitive dependencies and adjust huggingface_hub version in pyproject.toml

* 🔧 Test: Add mock modules for unstructured inference and set up package paths in test files

* 🔧 Test: Enhance test setup by adding optional SDK mocks and cleaning up module imports in data processing tests

* 🔧 Test: Consolidate mock module setup for unstructured inference across multiple test files

* 🔧 Test: Remove unused optional SDK mocks from test configuration

* 🔧 Refactor: Clean up imports and enhance dynamic loading of fastmcp components in Docker client

* 📦update:sdk dependence update

* Add CAS SSO integration and improve logout handling (#3072)

* feat: add CAS SSO integration

* Skip CAS logout when CAS_LOGOUT_URL is unset

* 取消转义

* Improve CAS logout handling and confirm user logout

* Disable account deletion for CAS users

* Add CAS session init SQL and k8s config

* clean code

* Remove agent guardrails design doc from tracking

* 补充文档

---------

Co-authored-by: hhhhsc <name>

* 🐛Bugfix: Remove unnecessary dependency exclusions and upgrade huggingface_hub version in pyproject.toml (#3211)

* refactor: move current time from system prompt to user message for prompt cache stability (#3203)

Remove {{time}} from all 4 prompt YAML templates (manager/managed × en/zh)
and strip time_str from the context_utils pipeline (_format_app_context,
build_skeleton_header_component, build_context_components,
build_app_context_string). Also remove time from create_agent_info render
kwargs and build_context_components call.

In CoreAgent.run, prepend [Current time: ...] to self.task so the timestamp
travels with the user message instead of being baked into the system prompt.
This makes the rendered system prompt fully deterministic per (agent_id,
tenant_id, version_no, language) — enabling prompt/KV cache hits across
requests for the same agent config.

Sync test_context_utils.py: drop time_str= from 3 test cases.

Remove unused datetime imports from context_utils.py and create_agent_info.py.

* 🐛 Bugfix: Fixed the issue of being unable to add MCP services via containerization. (#3213)

[Specification Details]
1. Modify the DEFAULT_NETWORK_NAME when starting the MCP service in the container to match the name in docker-compose.
2. Modify the parameters passed to the add_mcp_service method; custom_headers defaults to None.

* 🐛 Bugfix: Fixed the issue where uploaded text files could not be parsed during a session. (#3219)

* 🐛 Bugfix: Fixed the issue where uploaded text files could not be parsed during a session.
[Specification Details]
1. The return parameter of the file_process method has changed and needs to be unpacked.

* 🐛 Bugfix: Fixed the issue where uploaded text files could not be parsed during a session.
[Specification Details]
1. Modify test case.

* 🐛 Bugfix: Fixed an issue where the MCP service could not be added correctly after updating the FastMCP version. (#3222)

[Specification Details]
1. Add `kwargs` to the `create_httpx_client` function to accept all additional parameters.

* 🐛 Bugfix: Fix incomplete display of tenant resources page after window resize (#3215)

* Move non-shadcn ui component to other folder

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Add agent marketplace repository and version pinning for sub-agents (#3239)

* feat: add agent marketplace repository and pin sub-agent versions at publish

Introduce ag_agent_repository_t with list/status/publish/import APIs for
frozen agent snapshots. Pin selected_agent_version_no on agent relations when
publishing so sub-agents resolve to a fixed version at runtime. Extend agent
export/import to bundle skills in ZIP payloads and add embedding model fallback
when no model name is provided.

* feat: add agent marketplace repository and pin sub-agent versions at publish

Introduce ag_agent_repository_t with list/status/publish/import APIs for
frozen agent snapshots. Pin selected_agent_version_no on agent relations when
publishing so sub-agents resolve to a fixed version at runtime. Extend agent
export/import to bundle skills in ZIP payloads and add embedding model fallback
when no model name is provided.

* feat: add agent marketplace repository and pin sub-agent versions at publish

Introduce ag_agent_repository_t with list/status/publish/import APIs for
frozen agent snapshots. Pin selected_agent_version_no on agent relations when
publishing so sub-agents resolve to a fixed version at runtime. Extend agent
export/import to bundle skills in ZIP payloads and add embedding model fallback
when no model name is provided.

* feat: add agent marketplace repository and pin sub-agent versions at publish

Introduce ag_agent_repository_t with list/status/publish/import APIs for
frozen agent snapshots. Pin selected_agent_version_no on agent relations when
publishing so sub-agents resolve to a fixed version at runtime. Extend agent
export/import to bundle skills in ZIP payloads and add embedding model fallback
when no model name is provided.

* feat(agent): add verification configuration for agents and update related components (#3174)

* feat(agent): add verification configuration for agents and update related components

* feat(model): update model type labels and add monitoring dashboard translations

* 🐛 Bugfix: Fix inability to select agent from agent space to edit (#3240)

* Move non-shadcn ui component to other folder

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix inability to select agent from agent space to edit

* Bugfix: Display correct version info when viewing agent details

* Update data agent and ME CAS integration documentation (#3242)

* 补充dataagent对接文档

* 补充ME cas对接文档

* 补充ME cas对接文档

---------

Co-authored-by: hhhhsc <name>

* ✨ Add several northbound apis (#3223)

* ✨ Add several northbound apis

* ✨ Add several northbound apis

* ✨ Add several northbound apis

* ✨ Add several northbound apis

* ✨ Add several northbound apis

* refactor: simplify deployment script by removing unused variables and functions (#3245)

* feat(agent): add verification configuration for agents and update related components

* feat(model): update model type labels and add monitoring dashboard translations

* refactor(build_offline_package): simplify deployment script by removing unused variables and functions

* 🐛 Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field (#3246)

* Move non-shadcn ui component to other folder

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix incomplete display of tenant resources page after window resize

* Bugfix: Fix inability to select agent from agent space to edit

* Bugfix: Display correct version info when viewing agent details

* Bugfix: Adjust agent detail UI layout to accommodate newly added "self-verification" field

* 补充sql (#3248)

* 补充sql

* 扩大limit限制

* 🐛 Bugfix: Fixed an issue where the MCP service failed to start in a Kubernetes container. (#3254)

[Specification Details]
1. Modify the pod naming logic to convert all non-compliant characters to -.
2. Modify test cases.

* 🐛 Bugfix: knowledge_base_search_tool called with TypeError: argument of type 'FieldInfo' is not iterable (#3259)

* 🐛 Bugfix: Fixed an issue where the one-click rename function failed after importing an agent. (#3258)

[Specification Details]
1. The frontend does not pass `agent_id` when calling the `regenerate_name` API.

* Bugfix: Exclude attachments from assistant when saving conversation history (#3261)

* Bump APP_VERSION from v2.2.0 to v2.2.1 (#3268)

The default setting for client-side self-validation is "False".

---------

Co-authored-by: chase <byzhangxin11@126.com>
Co-authored-by: Chenlifeng <174292121+Lifeng-Chen@users.noreply.github.com>
Co-authored-by: Dallas98 <40557804+Dallas98@users.noreply.github.com>
Co-authored-by: Jason Wang <56037774+JasonW404@users.noreply.github.com>
Co-authored-by: Xia Yichen <iamjasonxia@126.com>
Co-authored-by: JeffWu <45140512+jeffwu-1999@users.noreply.github.com>
Co-authored-by: WMC001 <46217886+WMC001@users.noreply.github.com>
Co-authored-by: xuyaqi <xuyaqist@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: DongJiBao2001 <120021235+DongJiBao2001@users.noreply.github.com>
Co-authored-by: hhhhsc701 <56435672+hhhhsc701@users.noreply.github.com>
Co-authored-by: Dallas98 <990259227@qq.com>
Co-authored-by: frr <64584192+wuyuanfr@users.noreply.github.com>

* Revert "Release/v2.2.1 (#3269)" (#3272)

This reverts commit 9ff420e.

* ✨ Feature: add agent repository page and APIs

Introduce Agent Repository backend APIs, database/service support, frontend views, client services, and tests. Migrate Agent Space navigation and permissions to /agent-repository with updated SQL and localization.

* ✨ Feature: add agent repository page and APIs

Introduce Agent Repository backend APIs, database/service support, frontend views, client services, and tests. Migrate Agent Space navigation and permissions to /agent-repository with updated SQL and localization.

* ✨ Feature: add agent repository page and APIs

Introduce Agent Repository backend APIs, database/service support, frontend views, client services, and tests. Migrate Agent Space navigation and permissions to /agent-repository with updated SQL and localization.

* ✨ Feature: add agent repository page and APIs

Introduce Agent Repository backend APIs, database/service support, frontend views, client services, and tests. Migrate Agent Space navigation and permissions to /agent-repository with updated SQL and localization.

* ✨ Feature: add agent repository page and APIs

Introduce Agent Repository backend APIs, database/service support, frontend views, client services, and tests. Migrate Agent Space navigation and permissions to /agent-repository with updated SQL and localization.

---------

Co-authored-by: panyehong <91180085+YehongPan@users.noreply.github.com>
Co-authored-by: chase <byzhangxin11@126.com>
Co-authored-by: Dallas98 <40557804+Dallas98@users.noreply.github.com>
Co-authored-by: Jason Wang <56037774+JasonW404@users.noreply.github.com>
Co-authored-by: Xia Yichen <iamjasonxia@126.com>
Co-authored-by: JeffWu <45140512+jeffwu-1999@users.noreply.github.com>
Co-authored-by: WMC001 <46217886+WMC001@users.noreply.github.com>
Co-authored-by: xuyaqi <xuyaqist@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: DongJiBao2001 <120021235+DongJiBao2001@users.noreply.github.com>
Co-authored-by: hhhhsc701 <56435672+hhhhsc701@users.noreply.github.com>
Co-authored-by: Dallas98 <990259227@qq.com>
Co-authored-by: frr <64584192+wuyuanfr@users.noreply.github.com>
* refactor context manager assembly for W3

* test: align W3 context runtime unit tests

* fix: mount conversation context manager in runtime

* fix: address sonarcloud context quality issues

* fix: reduce OpenAIModel constructor parameter count

* test: reduce duplicated context setup

* test: cover input budget resolver handoff

* fix: isolate managed context runtime state
…ions (#3306)

* Add offline package compression and pull skipping

* ✨ Update installation and deployment instructions for Docker and Kubernetes

---------

Co-authored-by: hhhhsc <name>
  - Add file attachment upload/preview/remove UI in debug panel
  - Upload files to MinIO and pass minio_files in agent run params
  - Support file attachments in both debug and compare modes
  - Include attachment info in conversation history
  - Update data_process_service to return img_info alongside chunks
  - Make object_name/presigned_url optional in conversationService types
@jeffwu-1999 jeffwu-1999 force-pushed the wzf_add_support_for_uploading_files branch from ea1d235 to a73b04f Compare June 26, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.