This document outlines best practices for interacting with the LLMs integrated into the AWS V2 Hybrid Intelligence Platform. Following these guidelines will help you maximize AI performance, control costs, and achieve reliable structured outputs.
- 3-Layer System Prompt Architecture: The
MCPAgentsystem prompt is dynamically assembled from three layers:mcp_agent_system.md(Protocol Layer) — Human-editable Markdown template. Defines the ReAct loop logic, JSON call formats, and autonomous output rules.src/intelligence/prompts/config/(Knowledge Layer - SSOT) — YAML-based repository for Expert Roles, Analysis Frameworks (PSI, SWOT), and Report Templates. This ensures consistent business logic across both Agent and Workflow tracks.PromptManager(Assembly Layer) — Orchestrates the injection of values fromconfig/workflow_defaults.yamlinto YAML templates using$variablesyntax, ensuring AI standards are always synchronized with system thresholds.
- Execution Phases: The template guides the LLM through 5 phases: COLLECT → FILTER → ENRICH → ANALYZE → OUTPUT. Not all phases are required for every task.
- Autonomous Output Rules: Organized into General Principles (e.g., discover IDs via tools) and Feishu-Specific Rules (e.g., Attachment-First Policy for long reports, Bitable automation). This ensures the agent adapts its output strategy based on the platform.
- Config-Driven Prompts: Business "red lines" (e.g.,
high_monopoly_score,ad_traffic_ratio_max) are defined once inconfig/workflow_defaults.yaml. Changing a value there automatically updates the reasoning standards in all AI-generated reports. - Ad Dependency Red-lines: The platform now enforces an Advertising Dependency Policy. If an ASIN's
advertisingTrafficScoreRatio(from XiyouZhaoci) exceeds the threshold (default 35%), the LLM should flag it as high-risk, as it lacks organic "moat". - Tool Disambiguation: Similar tools are explicitly distinguished in descriptions (e.g.,
search_products= Amazon direct search;xiyou_keyword_analysis= third-party Xiyouzhaoci database). - Negative Constraints: Only use parameters in the tool's schema. One tool call per turn. No hallucinated data.
- Cloud-Only Token Budget: The
MCPAgenttracks cumulative cloud token usage separately from local tokens. Only cloud API calls (Gemini, Claude) count toward the budget (default: 50,000 tokens). Local model tokens are free and unlimited. - Gemini Advanced Pricing Support: The
PriceManageraccurately handles modern Gemini billing features:- Thinking Tokens: For models like
gemini-2.0-flash-thinking, thethoughts_token_countis included in the output token billing. - Prompt Caching: If a request hits the Gemini cache, the
cached_content_token_countis automatically deducted from the regular input count and billed at a significantly lowercache_readrate. - Tiered Pricing: Automatic switching between
lte_200kandgt_200kpricing tiers based on the prompt size.
- Thinking Tokens: For models like
- Budget Enforcement: When cloud tokens exceed the budget, the agent forces a final summary from collected data and notifies the user that remaining work will use batch API. The agent does NOT hard-fail.
- Leverage Local Models for Pre-processing: For large text inputs (e.g., raw HTML, long customer reviews), use the
IntelligenceRouterto automatically dispatch simple cleaning or summarization tasks to the local Llama.cpp model first. - Independent Routing: Every item in a batch is independently classified and routed, preventing complex tasks from being misrouted to simpler models.
- Token Tracking Fields:
session.token_usage— total tokens across all providers (informational).session.cloud_token_usage— cloud-only tokens (budget-relevant).
- GeminiProvider Token Reporting: Now reads
usage_metadata.total_token_countfrom responses, withcount_tokens()fallback for older API versions.
- Automatic Routing: The
IntelligenceRouter(src/intelligence/router/) automatically selects the best model based on task classification.- Local LLM (
llama.cpp): Ideal forSIMPLE_CLEANING,DATA_EXTRACTION. Cost-free, low latency for small tasks. - Cloud LLM (Gemini / Claude): Best for
DEEP_REASONING, complexCREATIVE_WRITING, and structured outputs requiring high accuracy. The router defaults to Gemini for cloud tasks; Claude is available as an alternative viaDEFAULT_LLM_PROVIDER=claude.
- Local LLM (
- Override: For specific needs, you can temporarily override the router's decision by explicitly passing a
categorytoroute_and_execute.
- Structured Output: Always aim for structured outputs (Pydantic models) to minimize hallucinations. This forces the LLM to adhere to a schema.
- Grounding with Data: Provide concrete, factual data from extractors and processors to anchor the LLM's responses.
- Verification: Implement post-processing checks where possible to validate LLM output against business rules.
- Detailed Logging: Enable
DEBUGlevel logging forsrc.intelligence.providers.*andsrc.intelligence.routerto see raw prompts and responses. - Raw Output Review: Check the raw JSON output from the LLM when structured parsing fails to identify schema mismatches or formatting issues.
- Traceback Analysis: When an Agent fails, follow the traceback to identify if the error originated from data acquisition, processing, or the LLM itself.