feat: improved error messages + hedge racing + retry enhancements#505
Open
jorgecuesta wants to merge 3 commits intomainfrom
Open
feat: improved error messages + hedge racing + retry enhancements#505jorgecuesta wants to merge 3 commits intomainfrom
jorgecuesta wants to merge 3 commits intomainfrom
Conversation
- Add qos/heuristic package for tiered response analysis: - Tier 1: Structural checks (empty, HTML, XML, non-JSON, malformed) - Tier 2: Protocol-specific analysis (JSON-RPC result vs error fields) - Tier 3: Error pattern detection (blockchain-specific, rate limits, etc.) - Fix backend 5xx reputation bug in protocol/shannon/context.go: - Backend 5xx responses now record CriticalErrorSignal (-25) instead of success - 4xx errors not penalized (often user's fault) - Wire heuristic detection to reputation in gateway: - When heuristic detects payload error despite HTTP 200, record correcting signal - High confidence errors: MajorError (-10) - Lower confidence errors: MinorError (-3) - Cleanup: Remove unused hydrateLogger and hydrateLoggerWithPayload functions
5b4d352 to
f8cb9ee
Compare
Protocol Error Propagation: - Add SetProtocolError to RequestQoSContext interface for specific error messages - Implement in all QoS contexts (EVM, Cosmos, Solana, NoOp, HealthCheck) - Replace generic "no endpoint responses received" with specific errors like "no valid endpoints available for service" or endpoint-specific details Hedge Racing (New Feature): - Add hedge_delay config to spawn parallel request after initial delay - Primary request races against hedge; first success wins - Configurable connect_timeout for fast-fail on connection issues - Track hedge results via X-Hedge-Result response header Retry Enhancements: - Add max_retry_latency to skip retries when request already took too long - Add retry endpoint rotation to try different endpoints on each retry - Track retry count and suppliers tried in response headers - Add heuristic-based retry detection for JSON-RPC errors in HTTP 200 Response Metadata Headers: - X-Retry-Count: number of retry attempts - X-Suppliers-Tried: comma-separated list of attempted suppliers - X-Hedge-Result: outcome of hedge racing (primary_won, hedge_won, etc.) - X-App-Address, X-Supplier-Address, X-Session-ID for debugging Heuristic Response Analysis: - Detect errors in response payloads despite HTTP 200 status - Identify JSON-RPC errors, HTML error pages, empty responses - Record correcting reputation signals for heuristic-detected failures - Support retry decisions based on payload analysis Health Check Improvements: - Add response heuristic analysis to health checks - Record heuristic errors to reputation service - Better logging for health check validation Metrics: - Add retry distribution metrics by reason (5xx, timeout, connection, heuristic) - Add hedge racing outcome metrics - Add batch size tracking metrics Config: - Add retry_config with hedge_delay, connect_timeout, max_retry_latency - Update schema and example configs
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces several reliability and observability improvements to PATH:
1. Protocol Error Propagation
"protocol-level error: no endpoint responses received"SetProtocolErrorto theRequestQoSContextinterface to propagate specific errors from the protocol layer to client responses"no valid endpoints available for service: service X""selected endpoint is not available: relay request will fail: service X endpoint pokt1..."2. Hedge Racing (New Feature)
retry_config.hedge_delayandretry_config.connect_timeoutX-Hedge-Resultheader3. Retry Enhancements
max_retry_latencyskips retries when the failed request already took too longX-Retry-CountandX-Suppliers-Triedheaders4. Heuristic Response Analysis
5. Response Metadata Headers
X-Retry-CountX-Suppliers-TriedX-Hedge-Resultprimary_only,primary_won,hedge_won,both_failedX-App-AddressX-Supplier-AddressX-Session-ID6. Health Check Improvements
Configuration
Test Plan