Skip to content

Multi-agent AI grading system with 3 specialized agents (2 Examiners + 1 Arbiter). 5x throughput improvement. LangGraph, DSPy, ChromaDB. Capstone project at IFF.

Notifications You must be signed in to change notification settings

savinoo/ai-grading-system

Repository files navigation

πŸŽ“ AI Grading System (Multi-Agent)

Status Python Stack

An autonomous, multi-agent system designed to automate the grading of complex academic essay questions with human-level reasoning and pedagogical feedback. Developed as a Capstone Project (TCC) in Computer Engineering.

✨ NEW: Professor Assistant Module with Analytics Dashboard!

πŸ“Š Results & Impact

Currently in pilot deployment at Instituto Federal Fluminense (IFF) with real coursework and students.

  • 5x throughput improvement β€” Grading 30 submissions reduced from 10+ minutes to ~2 minutes
  • 90% reduction in vector DB queries via intelligent RAG caching
  • Dual-examiner consensus β€” 2 independent Grader agents + 1 Referee reduces bias
  • Full explainability β€” Every grade includes written justification traceable to rubric criteria
  • 10+ analytics visualizations β€” Student evolution tracking, plagiarism detection, learning gap identification

πŸš€ Quick Deploy

Option 1: Streamlit Cloud (Recommended - 2 minutes)

Deploy to Streamlit

  1. Click the badge above (or go to https://share.streamlit.io/)
  2. Login with GitHub
  3. Select:
    • Repository: savinoo/ai-grading-system
    • Branch: feature/professor-assistant
    • Main file: app/main.py
  4. Click "Deploy"
  5. Add Secrets (Settings > Secrets):
    GOOGLE_API_KEY = "your-gemini-api-key"
    MODEL_NAME = "gemini-2.0-flash"
    TEMPERATURE = "0"
  6. Get Gemini API key (free): https://aistudio.google.com/app/apikey

Done! Your app will be live at: https://[your-app-name].streamlit.app

Option 2: Local Development

git clone https://github.com/savinoo/ai-grading-system.git
cd ai-grading-system
pip install -r requirements.txt

# Create .streamlit/secrets.toml with your API key
streamlit run app/main.py

⚑ Performance & Recent Improvements

Latest Update (2026-02-10):

πŸš€ Speed: 5x Faster Grading

  • Before: 10 students Γ— 3 questions = ~10 minutes
  • After: 10 students Γ— 3 questions = ~2-3 minutes
  • How: Increased parallelism (API_CONCURRENCY 2 β†’ 10)

🎯 Quality: Production-Grade Reliability

  • βœ… Grade normalization: Auto-detects and fixes 0-1 vs 0-10 scale issues
  • βœ… Robust error handling: Graceful fallbacks when LLM returns invalid JSON
  • βœ… Performance logging: Detailed timing for debugging bottlenecks
  • βœ… RAG caching: 90% reduction in vector DB queries

πŸ“š Documentation

Configuration

For Gemini free-tier (recommended to avoid rate limits):

export API_CONCURRENCY=5
export API_THROTTLE_SLEEP=0.5

For OpenAI (paid tier):

export API_CONCURRENCY=10  # or higher for more speed

See PERFORMANCE.md for full configuration guide.


🧠 Core Architecture

This system leverages a Multi-Agent Workflow orchestrated by LangGraph and optimized with DSPy for robust prompt engineering.

The Agents

  1. πŸ” Examiner Agent (C1 & C2): Two independent instances that grade student submissions against a detailed rubric using RAG (Retrieval-Augmented Generation) for context.
  2. βš–οΈ Arbiter Agent: Activated only when C1 and C2 diverge significantly (e.g., score difference > 1.5). It reviews arguments from both and decides the final grade.
  3. 🧬 Analytics Engine: Runs in parallel to detect semantic plagiarism and analyze student evolution trends across submissions.

Workflow Diagram

graph TD
    A[Start] --> B(RAG Context Retrieval)
    B --> C1[Examiner 1]
    B --> C2[Examiner 2]
    C1 --> D{Divergence Check}
    C2 --> D
    D -- "Diff > Threshold" --> E[Arbiter Agent]
    D -- "Consensus" --> F[Final Grade Calculation]
    E --> F
    F --> G[Feedback Generation]
    G --> H[Analytics & Insights]
    H --> I[End]
Loading

πŸ†• Professor Assistant Module

NEW in v2.0! Advanced analytics and student tracking system.

Features

πŸ‘€ Student Profile

  • Grade evolution tracking with trend detection
  • Learning gap identification (<60% criterion avg)
  • Strength recognition (>80% criterion avg)
  • Heatmap visualization of criterion performance
  • Confidence-scored predictions (RΒ² regression)

🏫 Class Analytics

  • Statistical distribution (mean, median, std dev, Q1, Q3, IQR)
  • Grade distribution (A/B/C/D/F buckets)
  • Outlier detection (struggling students & top performers)
  • Common learning gaps across class (>30% affected)
  • Question difficulty ranking
  • Top 5 student comparison (radar chart)

πŸ“Š Visualizations

  • 10+ interactive Plotly charts
  • Dual-axis comparisons
  • Heatmaps with colorscales
  • Box plots, violin plots, radar charts
  • Responsive design with gradient headers

πŸ’Ύ Persistent Memory

  • JSON-based student profile storage
  • GDPR-compliant data deletion
  • Automatic 365-day retention policy
  • Export functionality for reports

Access: Sidebar > "πŸ“Š Analytics Dashboard"


πŸš€ Key Features

  • Massive Parallel Processing: Optimized to handle batch corrections without hitting LLM Rate Limits (using Tenacity + Chunking).
  • Cost-Efficient Intelligence: Uses a tiered model strategy (Gemini 2.0 Flash for volume, Pro for complex arbitration).
  • Resilience: Self-healing logic for API errors and JSON formatting hallucinations.
  • Pedagogical Feedback: Generates constructive comments explaining why a grade was given.
  • Student Tracking: Longitudinal performance analysis with trend detection.
  • Auto-Tracking: Analytics automatically capture data during batch corrections.

πŸ› οΈ Tech Stack

  • Orchestration: LangGraph
  • Prompt Optimization: DSPy (Stanford)
  • LLM: Google Gemini 2.0 Flash (via LiteLLM)
  • Interface: Streamlit
  • Vector DB: ChromaDB (for RAG)
  • Analytics: Plotly, NumPy, SciPy
  • Testing: Pytest (90%+ coverage on analytics)

πŸ“¦ Installation & Setup

Prerequisites

  • Python 3.10+
  • Google Gemini API key (free tier available)

Steps

  1. Clone the repo:

    git clone https://github.com/savinoo/ai-grading-system.git
    cd ai-grading-system
  2. Install dependencies:

    pip install -r requirements.txt
  3. Configure Environment: Create .streamlit/secrets.toml:

    GOOGLE_API_KEY = "your-api-key-here"
    MODEL_NAME = "gemini-2.0-flash"
    TEMPERATURE = "0"

    Get API key: https://aistudio.google.com/app/apikey

  4. Run the App:

    streamlit run app/main.py
  5. Open Browser:

    http://localhost:8502
    

πŸ“– Usage

Single Student Debug Mode

  1. Select "Single Student (Debug)" in sidebar
  2. Configure question and rubric
  3. Provide student answer
  4. Execute and review detailed results

Batch Processing (Class)

  1. Select "Batch Processing (Turma)"
  2. Choose "SimulaΓ§Γ£o Completa (IA)"
  3. Configure: 5 questions, 5-10 students
  4. Generate questions β†’ Simulate answers β†’ Execute corrections
  5. View results dashboard with class metrics

Analytics Dashboard

  1. After running batch corrections, select "πŸ“Š Analytics Dashboard"
  2. Navigate tabs:
    • Overview: Total students, submissions, global metrics
    • Student Profile: Individual student analysis with visualizations
    • Class Analysis: Aggregate statistics and insights

Note: Analytics automatically track data during batch corrections. No manual setup needed!


πŸ§ͺ Testing

Run tests:

pytest tests/ -v

Test coverage (analytics modules):

pytest tests/test_analytics.py --cov=src/analytics

πŸ“‚ Project Structure

ai-grading-system/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              # Streamlit entry point
β”‚   β”œβ”€β”€ analytics_ui.py      # Analytics visualizations (NEW)
β”‚   └── ui_components.py     # UI helpers
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agents/              # Examiner, Arbiter agents
β”‚   β”œβ”€β”€ analytics/           # Student tracker, class analyzer (NEW)
β”‚   β”œβ”€β”€ domain/              # Pydantic schemas
β”‚   β”œβ”€β”€ memory/              # Persistent storage (NEW)
β”‚   β”œβ”€β”€ workflow/            # LangGraph workflow
β”‚   β”œβ”€β”€ rag/                 # Vector DB, retrieval
β”‚   └── config/              # Settings, prompts
β”œβ”€β”€ tests/                   # Unit tests
β”œβ”€β”€ examples/                # Integration examples
β”œβ”€β”€ DEPLOY.md                # Deployment guide
└── requirements.txt         # Dependencies

🎨 Screenshots

Analytics Dashboard

(Add screenshots after deployment)

Student Profile:

  • Gradient header with performance cards
  • Dual chart (line + bar) with trend line
  • Heatmap of criterion evolution
  • Severity-coded learning gaps

Class Analytics:

  • Statistical distribution (3-tab view)
  • Ranking with medals and trend indicators
  • Radar chart comparison (Top 5)
  • Question difficulty ranking

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

πŸ“„ License

MIT License - See LICENSE file for details


πŸ‘₯ Authors

Lucas Lorenzo Savino & Maycon Mendes
Computer Engineering - Instituto Federal Fluminense (IFF)

Capstone Project (TCC) - 2024/2025


πŸ™ Acknowledgments

  • LangGraph - Agent orchestration framework
  • DSPy - Prompt optimization (Stanford)
  • Google Gemini - LLM API
  • Streamlit - Interactive web framework
  • OpenClaw - Development automation

πŸ“ž Support

Issues? Questions?

  • Open a GitHub issue
  • Contact: github.com/savinoo

πŸ”— Links


⭐ Star this repo if you find it useful!

About

Multi-agent AI grading system with 3 specialized agents (2 Examiners + 1 Arbiter). 5x throughput improvement. LangGraph, DSPy, ChromaDB. Capstone project at IFF.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages