Skip to content

Sudhss/Axios-Sovereign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PROJECT DOCUMENT: AXIOM-SOVEREIGN (WINNER'S EDITION)

  1. Title & Basic Info • Project Name: Axiom-Sovereign • One-line Tagline: Distributed Autonomous Remediation with Multi-Agent Explainability & Governance. • Domain / Category: SRE / Autonomous Agents / Chaos Engineering. • Team: The Butchers
  2. Elevator Pitch Axiom-Sovereign is a lean autonomous remediation engine that resolves infrastructure failures through a multi-agent "Debate" protocol. It features sub-10ms system-level event detection (simulated for demo), an AI Explainability panel, and automated Jira governance to ensure every fix is audited and justified.
  3. The 48-Hour "Winner" Features
  4. AI Explainability Panel: Live UI section showing the Agent’s Reasoning, Hypothesis, and Confidence Scores.
  5. Auto Root Cause Graph: A visual flow showing failure propagation (e.g., DB Down → App Degraded).
  6. Structured Agent Debate: Two internal LLM prompts (Agent A vs Agent B) that output a JSON-structured conflict resolution.
  7. Smart Retry Strategy: Intelligent evolution of fixes: Restart (L1) → Cache Clear (L2) → Human Escalation (L3).
  8. Jira State Machine: Automated ticket lifecycle (To Do -> Done) with the "Decision JSON" logged as a comment.
  9. Problem Statement • The Crisis: Manual L1 triage is the biggest bottleneck in SRE, leading to high MTTR. • The Gap: Standard bots act without explaining why, making them risky for production. • The Solution: A system that argues the best fix before executing, providing 100% transparency.
  10. System Architecture (Lean & Real) • Sentinel Layer (C++): Lightweight system-level monitoring (simulating kernel-level events for demo speed) to detect log/state changes. • Intelligence Orchestrator (Python): Central engine managing the Multi-Agent Debate and Jira API. • Governance Layer (Jira): Source of truth for incident state and audit trail. • Chaos Controller: Dashboard buttons to "Choke" nodes (Stop Process, DB Timeout). • Visualizer (Streamlit): Unified UI for the Node Map, RCA Graph, and Explainability Panel.
  11. The Logic: Sense-Think-Act-Explain
  12. Sense: C++ Sentinel flags a CRITICAL state change.
  13. Think (JSON Debate): Orchestrator runs a dual-prompt debate. Internal JSON Output:
  14. {
  15. "agent_A": {"action": "restart_db", "confidence": 0.85},
  16. "agent_B": {"action": "check_pool", "confidence": 0.72},
  17. "final_decision": "restart_db",
  18. "reasoning": "High confidence in service recovery via restart."
  19. }
  20. Act: Executes systemctl restart (or mock script) via subprocess.
  21. Explain: Updates UI and Jira with the "Why" using the reasoning from JSON.
  22. The Winning Demo Flow
  23. Baseline: System is . Traffic is flowing on the visualizer.
  24. Chaos: Click "CHOKE DATABASE".
  25. Detection: Dashboard turns red; Sentinel alerts the Orchestrator.
  26. Reasoning: Explainability Panel populates with the JSON Debate.
  27. Governance: Jira ticket is created; Ticket ID appears on UI.
  28. Remediation: Agent executes fix. System returns to .
  29. Closure: Jira ticket auto-moves to DONE.
  30. Punchline: "Not only does it fix, it explains exactly WHY it fixed it."
  31. Tech Stack • Monitoring: C++. • Orchestration: Python (FastAPI/Streamlit). • Intelligence: Gemini 1.5 Flash (Fast & Lean). • Governance: Jira Cloud API. • State Store: Local global_state.json (No Redis/External DB for 48h speed).
  32. Evaluation Strategy • MTTR: Time from Choke to Jira Done. • Explainability Score: Clarity of the Agent Debate log.
  33. ChatGPT / LLM Context Prompt "I am building 'Axiom-Sovereign' with team 'The Butchers'. It's an autonomous remediation system for a hackathon. It uses C++ monitoring and a Python orchestrator. Key features: Multi-agent debate (outputting JSON with confidence scores), Jira integration, and a Streamlit explainability panel. Help me implement the JSON debate logic and the Jira status transition."
image

About

Autonomous SRE system that detects failures, debates fixes using multi-agent AI, and executes safe remediation with full explainability.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors