Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
cad1e7a
[Feature] Add troubleshoot and stats-cache-hitratio skills
Apr 12, 2026
5346b51
chore: remove sample log fixtures per request
mouxinqq Apr 12, 2026
d0326fc
[Feature] Add skill
mouxinqq Apr 12, 2026
4ced999
fix(stat-cache-hitrate): include dated span and markdown summary output
mouxinqq Apr 12, 2026
26b734b
[Feature] Add skills
mouxinqq Apr 12, 2026
be5c4f5
Fix stat-cache-hitrate path links for terminal output
mouxinqq Apr 12, 2026
0f8c87d
[Feature] Add skills
mouxinqq Apr 12, 2026
5d29849
split session and window logic out of stat_cache_hitrate
mouxinqq Apr 12, 2026
9b57acc
Merge pull request #4 from mouxinqq/codex/modify-stats-cache-hitratio…
mouxinqq Apr 12, 2026
c32897d
refine merged-window format and print session detail link
mouxinqq Apr 12, 2026
d7460ac
Merge branch 'develop' into codex/modify-stats-cache-hitratio-path-si…
mouxinqq Apr 12, 2026
65c4dbd
Merge pull request #5 from mouxinqq/codex/modify-stats-cache-hitratio…
mouxinqq Apr 12, 2026
37dbd78
Improve stat-cache-hitrate UX and running metric normalization
mouxinqq Apr 12, 2026
9db0ec9
Merge pull request #6 from mouxinqq/codex/modify-stats-cache-hitratio…
mouxinqq Apr 12, 2026
5791a80
Improve skill reports: markdown session tables and timestamped output…
mouxinqq Apr 12, 2026
51a5457
Merge pull request #7 from mouxinqq/codex/refactor-stats-cache-hitrat…
mouxinqq Apr 12, 2026
fe64031
Refine session detail output: indexed IDs, trace fallback, and switch…
mouxinqq Apr 12, 2026
0839061
Merge branch 'develop' into codex/refactor-stats-cache-hitratio-struc…
mouxinqq Apr 12, 2026
8a5ddea
Merge pull request #8 from mouxinqq/codex/refactor-stats-cache-hitrat…
mouxinqq Apr 12, 2026
81be2a2
Improve session detail markdown id_type summary and table alignment
mouxinqq Apr 12, 2026
7d19e4e
Merge pull request #9 from mouxinqq/codex/update-stats-cache-hitratio…
mouxinqq Apr 12, 2026
b8f12c4
Simplify time-range prompt flow in stat-cache-hitrate skill
mouxinqq Apr 12, 2026
28b5609
Merge branch 'develop' into codex/update-stats-cache-hitratio-markdow…
mouxinqq Apr 12, 2026
96df827
Merge pull request #10 from mouxinqq/codex/update-stats-cache-hitrati…
mouxinqq Apr 12, 2026
8564f9c
Unify full session detail table columns with Top20
mouxinqq Apr 12, 2026
03c4cf9
Merge branch 'develop' into codex/update-stats-cache-hitratio-markdow…
mouxinqq Apr 12, 2026
c4fb2a0
Merge pull request #11 from mouxinqq/codex/update-stats-cache-hitrati…
mouxinqq Apr 12, 2026
322b98b
infer mixed token-select counts from router semantics
mouxinqq Apr 12, 2026
2df1eec
count token-select by prefill/mixed worker type only
mouxinqq Apr 12, 2026
a75419e
Merge pull request #12 from mouxinqq/codex/update-skill-creator-for-u…
mouxinqq Apr 12, 2026
83279b7
troubleshoot: clarify DEGRADED meaning in report header
mouxinqq Apr 12, 2026
f225259
Merge pull request #13 from mouxinqq/codex/modify-troubleshoot-for-sk…
mouxinqq Apr 12, 2026
55657a4
troubleshoot: revert trend windows to auto and split detail outputs b…
mouxinqq Apr 13, 2026
b30bd2b
Merge branch 'develop' into codex/modify-troubleshoot-for-skills-alig…
mouxinqq Apr 13, 2026
8e202c2
Merge pull request #14 from mouxinqq/codex/modify-troubleshoot-for-sk…
mouxinqq Apr 13, 2026
5ac7cb3
troubleshoot: map token-release type by worker URL instead of time-ne…
mouxinqq Apr 13, 2026
92efef6
Merge branch 'develop' into codex/modify-troubleshoot-for-skills-alig…
mouxinqq Apr 13, 2026
e5ae147
Merge pull request #15 from mouxinqq/codex/modify-troubleshoot-for-sk…
mouxinqq Apr 13, 2026
c8a3fd7
fix(troubleshoot): add FIFO+ID consistency checks and quote-safe hint
mouxinqq Apr 13, 2026
d234a5f
Merge branch 'develop' into codex/modify-troubleshoot-for-skills-alig…
mouxinqq Apr 13, 2026
ad1ee5d
Merge pull request #16 from mouxinqq/codex/modify-troubleshoot-for-sk…
mouxinqq Apr 13, 2026
a317cec
fix(load): treat positive req delta as possible in-flight requests
mouxinqq Apr 13, 2026
b45b91c
Merge branch 'develop' into codex/modify-troubleshoot-for-skills-alig…
mouxinqq Apr 13, 2026
bc2d4f2
Merge pull request #17 from mouxinqq/codex/modify-troubleshoot-for-sk…
mouxinqq Apr 13, 2026
0afcff2
[Feature] Add troubleshoot and stats-cache-hitratio skills
Apr 13, 2026
e03b69f
docs: add router troubleshoot playbook with skill workflow
mouxinqq Apr 13, 2026
53ef365
Merge pull request #18 from mouxinqq/codex/modify-troubleshoot-skills…
mouxinqq Apr 13, 2026
09c1824
[Feature] Add troubleshoot and stats-cache-hitratio skills
Apr 13, 2026
fe3c0d1
Adjust trace input flow to direct prompt instead of AskUserQuestion
mouxinqq Apr 13, 2026
980d0ff
Merge pull request #19 from mouxinqq/codex/update-troubleshoot-in-gol…
mouxinqq Apr 13, 2026
b65a31f
Store trace detail markdowns under detail/trace subfolder
mouxinqq Apr 13, 2026
d1d19aa
Merge branch 'develop' into codex/update-troubleshoot-in-golang_route…
mouxinqq Apr 13, 2026
2fb6e34
Merge pull request #20 from mouxinqq/codex/update-troubleshoot-in-gol…
mouxinqq Apr 13, 2026
41f56f7
stat-cache-hitrate: remove watch mode and loop guidance
mouxinqq Apr 13, 2026
26fb4b4
Merge pull request #21 from mouxinqq/codex/modify-stat-cache-hitratio…
mouxinqq Apr 13, 2026
984a925
skills: generalize tail shorthand parsing for line counts
mouxinqq Apr 13, 2026
a928338
Merge pull request #22 from mouxinqq/codex/update-stat-cache-hitratio…
mouxinqq Apr 13, 2026
b68181d
[Feature] Add troubleshoot and stats-cache-hitratio skills
Apr 13, 2026
87a7910
[Feature] Add troubleshoot and stats-cache-hitratio skills
Apr 13, 2026
109a8e5
[Feature] Add skills and Add logging cleanup
Apr 13, 2026
599085d
Merge branch 'PaddlePaddle:develop' into develop
mouxinqq Apr 13, 2026
888b0ac
[Feature] Add skills and logging cleanup
Apr 13, 2026
e16652d
[Feature] Add skills and logging cleanup
Apr 13, 2026
38b6ea0
[Feature] Update logging cleanup
Apr 14, 2026
5582779
[Feature] Update logging cleanup
Apr 14, 2026
fd56b0a
[Feature] Update logging cleanup
Apr 14, 2026
cc38640
[Feature] Update logging cleanup
Apr 14, 2026
06a886e
[Feature] Update logging cleanup
Apr 14, 2026
ce775c9
[Feature] Update logging cleanup
Apr 14, 2026
ee0162a
[Feature] Update logging cleanup
Apr 14, 2026
cbdb548
[Feature] Update logging cleanup
Apr 14, 2026
fd3c013
test(golang_router): cover cleanup loop and cross-day log rolling
mouxinqq Apr 14, 2026
f5d325b
Merge pull request #23 from mouxinqq/codex/add-unit-test-for-startlog…
mouxinqq Apr 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/online_serving/router.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ scheduler:
policy: "power_of_two" # Scheduling policy (optional): random, power_of_two, round_robin, process_tokens, request_num, cache_aware, remote_cache_aware, fd_metrics_score, fd_remote_metrics_score
prefill-policy: "cache_aware" # Prefill scheduling policy in PD mode
decode-policy: "request_num" # Decode scheduling policy in PD mode
eviction-interval-secs: 60 # Cache eviction interval for CacheAware scheduling
eviction-interval-secs: 60 # Counter eviction interval for CacheAware scheduling
eviction-duration-mins: 30 # Eviction duration for cache-aware radix tree nodes (minutes); default: 30
balance-abs-threshold: 1 # Absolute threshold for CacheAware balancing
balance-rel-threshold: 0.2 # Relative threshold for CacheAware balancing
Expand Down
41 changes: 36 additions & 5 deletions docs/online_serving/router_faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,24 @@ For basic Router usage, please refer to [Load-Balancing Scheduling Router](route
| `empty baseURL provided` | Health check received an empty base URL | Health check cannot be performed | Registration parameters |
| `failed to create request: {error}` | Failed to create health check request | The instance may be marked as unhealthy | Network environment |
| `failed to read response body: {error}` | Failed to read health check response body | The instance may be marked as unhealthy | Backend instance status |
| `Failed to select mixed worker: {error}` | Failed to select Mixed worker in centralized mode | Current request returns 502 | Health status, scheduling strategy |
| `Failed to select prefill worker: {error}` | Failed to select Prefill worker in PD disaggregated mode | Current request returns 502 | Health status, scheduling strategy |
| `Failed to read register request body: {error}` | Failed to read registration request body | Registration request returns 400 | Request format |
| `Failed to unmarshal register request JSON: {error}` | Failed to parse registration request JSON | Registration request returns 400 | Request format |
| `Failed to create decode request for {url}: {error}` | Failed to create HTTP request to Decode instance | Current request fails | Network environment |
| `Failed to create prefill request for {url}: {error}` | Failed to create HTTP request to Prefill instance | Current request fails | Network environment |
| `Decode request failed for {url}: {error}` | Request to Decode instance failed | Current request fails | Backend instance status, network connectivity |
| `Prefill request failed for {url}: {error}` | Request to Prefill instance failed | Current request fails | Backend instance status, network connectivity |
| `Failed to read request body: {error}` | Failed to read inference request body | Current request returns 400 | Request format |
| `Failed to unmarshal request JSON: {error}` | Failed to parse inference request JSON | Current request returns 400 | Request format |
| `Failed to select worker pair: {error}` | Failed to select worker pair in PD disaggregated mode | Current request returns 502 | Health status, scheduling strategy |
| `Failed to build disaggregate_info: {error}` | Failed to build PD disaggregation communication info | Current request returns 500 | Registration parameters (connector_port, device_ids, etc.) |
| `Failed to encode modified request: {error}` | Failed to encode modified request body | Current request returns 500 | Request content |
| `Failed to select worker: {error}` | Failed to select worker in centralized mode | Current request returns 502 | Health status, scheduling strategy |
| `Failed to connect to backend service: {error}` | Failed to connect to backend inference instance (after 3 retries) | Current request returns 502 | Backend instance status, network connectivity |
| `Request failed (attempt {n}/{max}): {error}` | Request attempt {n} failed | If retries exhausted, request returns 502 | Backend instance status, network connectivity |
| `Failed to create backend request for {url}: {error}` | Failed to create HTTP request to backend | Current request fails | Network environment |
| `Backend request failed for {url}: {error}` | Request to backend instance failed | Current request fails | Backend instance status, network connectivity |

### Warn-Level Logs

Expand All @@ -37,8 +55,9 @@ For basic Router usage, please refer to [Load-Balancing Scheduling Router](route
| `Server {url} is not healthy` | The instance at this URL failed health check | Router cannot register the instance, or will remove it from the registered list | Health status |
| `Instance {url} role is unknown` | Instance role cannot be recognized | The instance will not be added to the scheduling list | Registration parameters |
| `cache-aware prefill: tokenizer failed, fallback to char tokens: {error}` | Tokenizer service call failed, automatically falling back to character-based tokenization | cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer; normal request processing is not affected | Tokenizer service status |
| `cache-aware prefill: tokenize failed, fallback to process_tokens: {error}` | Tokenization completely failed (e.g., empty input), falling back to process_tokens strategy | Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected | Request content, Tokenizer service status |
| `cache-aware prefill: final strategy: process_tokens, reason: tokenize failed: {error}. ts_ms={ts}` | Tokenization failed (new format), falling back to process_tokens strategy | Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected | Request content, Tokenizer service status |
| `GetRemoteMetrics failed for {url}, falling back to local counter: {error}` | Failed to fetch remote metrics, falling back to local counter | Scheduling accuracy may decrease; normal request processing is not affected | Backend instance metrics port, network connectivity |
| `release worker: {url} skipped, counter already cleaned up` | Worker counter was already cleaned up when trying to release | May occur when a worker is removed by health check while requests are still in-flight | Health status, request timing |
| `release worker: {url} skipped, counter already zero (possible double-release)` | Worker counter is already zero when trying to release | Possible duplicate counter release | Request processing logic |

### Info-Level Logs

Expand All @@ -49,7 +68,6 @@ For basic Router usage, please refer to [Load-Balancing Scheduling Router](route
| `Successfully registered instance from index {index}` | Instance from config file registered successfully | Normal startup log |
| `No instances found in config file {path}` | No instances found in the registration config file | Check whether register.yaml is empty |
| `Request completed successfully.` | Request processing completed | Normal operation log |
| `Request failed, retrying...` | Request failed, retrying | Router will retry up to 3 times |
| `select worker (prefill): {url}, tokens: {tokens}` | Prefill scheduler selected a worker, showing current token processing count | Normal operation log |
| `select worker ({type}): {url}, count: {count}` | Decode/Mixed scheduler selected a worker, showing current request concurrency | Normal operation log |
| `release worker: {url}, count: {count}` | Request ended, worker counter released | Normal operation log |
Expand All @@ -58,20 +76,24 @@ For basic Router usage, please refer to [Load-Balancing Scheduling Router](route
| `removed counters for {count} unhealthy workers: {urls}` | Batch cleanup of counters for unhealthy workers | Normal operation log |
| `[stats] total_running={n}, workers: [{loads}], cache_hit_rate={rate}% (hits={hits}/total={total})` | Periodic stats: total requests, worker loads, cache hit rate | Normal operation log, useful for monitoring and tuning |
| `Parsing completed; starting worker selection.` | Request parsing completed, starting worker selection | Normal operation log |
| `Request completed with an error.` | Request processing completed with an error | Check backend instance status |
| `[SelectWorkerPair] decode selection failed, releasing prefill counter url={url}` | Decode selection failed in PD disaggregated mode, releasing Prefill counter | Error handling log |
| `[prefill] first chunk received, release counter url={url}` | Prefill streaming response received first chunk, counter released | Normal operation log |
| `[prefill] non-stream prefill response done, release counter url={url}` | Prefill non-streaming response completed, counter released | Normal operation log |
| `[prefill] backendResp is nil or backendResp.Body is nil, url={url}` | Prefill backend response is nil | May indicate backend connection issue |
| `[prefill] release in defer (fallback) url={url}, isStream={bool}` | Fallback resource release when Prefill request exits abnormally | Error handling log |
| `[prefill] release in CommonCompletions defer (error path) url={url}` | Prefill resource release on error path | Error handling log |
| `cache-aware prefill: final strategy: process_tokens, reason: strategy not initialized` | cache_aware strategy not initialized, falling back to process_tokens | Check cache_aware configuration |
| `cache-aware prefill: final strategy: process_tokens, reason: tokenize failed: {error}. ts_ms={ts}` | Tokenization failed, falling back to process_tokens strategy | Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected |
| `cache-aware prefill: final strategy: process_tokens, reason: load imbalanced, loads={loads}. ts_ms={ts}` | Load imbalanced across instances, falling back to process_tokens strategy | Normal operation log, automatic load balancing switch |
| `cache-aware prefill: final strategy: cache_aware_scoring, selected={url}, loads={loads}, hitRatios={ratios}. ts_ms={ts}` | cache_aware scoring strategy selected a worker | Normal operation log, showing loads and hit ratios |
| `[{method}] {path} {proto} {status} {latency} {clientIP}` | HTTP request access log | Normal operation log, records basic info for each request |
| `before SelectWorker prefill. ts_ms={ts}` | Starting Prefill worker selection in PD disaggregated mode | Normal operation log, for performance tracing |
| `before SelectWorker decode, after prefill. ts_ms={ts}` | Starting Decode worker selection after Prefill selection | Normal operation log, for performance tracing |
| `after SelectWorker decode, before return. ts_ms={ts}` | Decode worker selection completed | Normal operation log, for performance tracing |
| `unhealthy worker counter preserved (inflight requests): {url}, count: {count}` | Unhealthy worker still has in-flight requests, counter temporarily preserved | Normal operation log, will be auto-cleaned after in-flight requests complete |
| `unhealthy worker token counter preserved (inflight requests): {url}, tokens: {tokens}` | Unhealthy worker still has in-flight token load, token counter temporarily preserved | Normal operation log, will be auto-cleaned after in-flight requests complete |
| `cleanup unhealthy worker token counter: {url}` | Cleaned up token counter for unhealthy worker | Normal operation log |
| `preserved counters for {count} workers with inflight requests: {urls}` | Batch preserved counters for workers with in-flight requests | Normal operation log |

### Debug-Level Logs

Expand Down Expand Up @@ -100,6 +122,10 @@ For basic Router usage, please refer to [Load-Balancing Scheduling Router](route
| `{"error": "Failed to build disaggregate_info"}` | 500 | Failed to build PD disaggregation communication info | Registration parameters (connector_port, device_ids, etc.) |
| `{"error": "Invalid request body"}` | 400 | Failed to read request body | Request format |
| `{"error": "Invalid JSON format"}` | 400 | Failed to parse request body JSON | Request format |
| `{"error": "Failed to encode modified request: {error}"}` | 500 | Failed to encode modified request body | Request content |
| `{"code": 500, "msg": "Internal server error"}` | 500 | A panic occurred during request processing and was recovered | Backend instance status, request content |

> **Note**: In PD disaggregated (splitwise) mode, the above error responses include an additional `request_id` field, e.g., `{"error": "...", "request_id": "xxx"}`. Additionally, `Invalid request body` and `Invalid JSON format` responses include specific error details, e.g., `{"error": "Invalid request body: EOF"}`.

### Registration Request Errors (/register)

Expand All @@ -111,6 +137,7 @@ For basic Router usage, please refer to [Load-Balancing Scheduling Router](route
| `{"code": 400, "msg": "splitwise mode only supports PREFILL/DECODE instances"}` | 400 | MIXED instances are not allowed in PD disaggregated mode | Deployment mode, instance role |
| `{"code": 400, "msg": "only MIXED instances are allowed"}` | 400 | Only MIXED instances are allowed in centralized mode | Deployment mode, instance role |
| `{"code": 400, "msg": "invalid InstanceInfo format: {error}"}` | 400 | Instance registration info validation failed | Registration parameters |
| `{"code": 400, "msg": "DefaultManager is nil"}` | 400 | Router internal manager not initialized | Router startup status |
| `{"code": 200, "msg": "Register success"}` | 200 | Registration successful | — |

### Common Registration Parameter Validation Errors
Expand All @@ -124,6 +151,10 @@ For basic Router usage, please refer to [Load-Balancing Scheduling Router](route
| `port is required` | Missing port field | Add the port field |
| `invalid port: {port}` | port is not a valid port number | Provide a port number in the range 1-65535 |
| `invalid protocol: {protocol}` | Invalid transfer protocol | Use a valid protocol value: ipc / rdma |
| `invalid connector_port: {port}` | connector_port is not a valid port number | Provide a port number in the range 1-65535 |
| `invalid engine_worker_queue_port: {port}` | engine_worker_queue_port is not a valid port number | Provide a port number in the range 1-65535 |
| `invalid metrics_port: {port}` | metrics_port is not a valid port number | Provide a port number in the range 1-65535 |
| `rdma_ports[{index}] invalid port: {port}` | Port at index {index} in RDMA ports list is not valid | Provide a port number in the range 1-65535 |

## Troubleshooting Guide

Expand Down Expand Up @@ -236,7 +267,7 @@ If `Failed to start server` appears in startup logs, check:
When using the `cache_aware` scheduling strategy, the Router calls a Tokenizer service to tokenize requests for cache hit ratio computation. When the Tokenizer service is unavailable, the Router has a two-level degradation mechanism:

1. **Fallback to character-based tokenization** (common case): The log will show `tokenizer failed, fallback to char tokens`. The cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer. Cache hit accuracy may decrease, but normal request processing is not affected.
2. **Fallback to process_tokens strategy** (extreme case): When tokenization completely fails (e.g., empty request content), the log will show `tokenize failed, fallback to process_tokens`. The cache_aware strategy temporarily becomes inactive, and scheduling falls back to token processing volume. Normal request processing is not affected.
2. **Fallback to process_tokens strategy** (extreme case): When tokenization completely fails (e.g., empty request content), the log will show `cache-aware prefill: final strategy: process_tokens, reason: tokenize failed: {error}. ts_ms={ts}` (Info level). The cache_aware strategy temporarily becomes inactive, and scheduling falls back to token processing volume. Normal request processing is not affected.

To restore full cache_aware functionality:

Expand Down
2 changes: 1 addition & 1 deletion docs/zh/online_serving/router.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ scheduler:
policy: "power_of_two" # 调度策略(可选): random, power_of_two, round_robin, process_tokens, request_num, cache_aware, remote_cache_aware, fd_metrics_score, fd_remote_metrics_score; 默认: request_num
prefill-policy: "cache_aware" # pd分离模式下prefill节点调度策略; 默认: process_tokens
decode-policy: "request_num" # pd分离模式下decode节点调度策略; 默认: request_num
eviction-interval-secs: 60 # cache-aware策略清理过期cache的间隔时间
eviction-interval-secs: 60 # cache-aware策略清理过期计数器的间隔时间
eviction-duration-mins: 30 # cache-aware策略radix tree节点驱逐时间(分钟); 默认: 30
balance-abs-threshold: 1 # cache-aware策略绝对阈值
balance-rel-threshold: 0.2 # cache-aware策略相对阈值
Expand Down
Loading
Loading