An autonomous, multi-agent system designed to automate the grading of complex academic essay questions with human-level reasoning and pedagogical feedback. Developed as a Capstone Project (TCC) in Computer Engineering.
β¨ NEW: Professor Assistant Module with Analytics Dashboard!
Currently in pilot deployment at Instituto Federal Fluminense (IFF) with real coursework and students.
- 5x throughput improvement β Grading 30 submissions reduced from 10+ minutes to ~2 minutes
- 90% reduction in vector DB queries via intelligent RAG caching
- Dual-examiner consensus β 2 independent Grader agents + 1 Referee reduces bias
- Full explainability β Every grade includes written justification traceable to rubric criteria
- 10+ analytics visualizations β Student evolution tracking, plagiarism detection, learning gap identification
- Click the badge above (or go to https://share.streamlit.io/)
- Login with GitHub
- Select:
- Repository:
savinoo/ai-grading-system - Branch:
feature/professor-assistant - Main file:
app/main.py
- Repository:
- Click "Deploy"
- Add Secrets (Settings > Secrets):
GOOGLE_API_KEY = "your-gemini-api-key" MODEL_NAME = "gemini-2.0-flash" TEMPERATURE = "0"
- Get Gemini API key (free): https://aistudio.google.com/app/apikey
Done! Your app will be live at: https://[your-app-name].streamlit.app
git clone https://github.com/savinoo/ai-grading-system.git
cd ai-grading-system
pip install -r requirements.txt
# Create .streamlit/secrets.toml with your API key
streamlit run app/main.pyLatest Update (2026-02-10):
- Before: 10 students Γ 3 questions = ~10 minutes
- After: 10 students Γ 3 questions = ~2-3 minutes
- How: Increased parallelism (API_CONCURRENCY 2 β 10)
- β Grade normalization: Auto-detects and fixes 0-1 vs 0-10 scale issues
- β Robust error handling: Graceful fallbacks when LLM returns invalid JSON
- β Performance logging: Detailed timing for debugging bottlenecks
- β RAG caching: 90% reduction in vector DB queries
PERFORMANCE.md- Benchmarks, configuration, troubleshootingCHANGELOG.md- Detailed changelog with migration guide
For Gemini free-tier (recommended to avoid rate limits):
export API_CONCURRENCY=5
export API_THROTTLE_SLEEP=0.5For OpenAI (paid tier):
export API_CONCURRENCY=10 # or higher for more speedSee PERFORMANCE.md for full configuration guide.
This system leverages a Multi-Agent Workflow orchestrated by LangGraph and optimized with DSPy for robust prompt engineering.
- π Examiner Agent (C1 & C2): Two independent instances that grade student submissions against a detailed rubric using RAG (Retrieval-Augmented Generation) for context.
- βοΈ Arbiter Agent: Activated only when C1 and C2 diverge significantly (e.g., score difference > 1.5). It reviews arguments from both and decides the final grade.
- 𧬠Analytics Engine: Runs in parallel to detect semantic plagiarism and analyze student evolution trends across submissions.
graph TD
A[Start] --> B(RAG Context Retrieval)
B --> C1[Examiner 1]
B --> C2[Examiner 2]
C1 --> D{Divergence Check}
C2 --> D
D -- "Diff > Threshold" --> E[Arbiter Agent]
D -- "Consensus" --> F[Final Grade Calculation]
E --> F
F --> G[Feedback Generation]
G --> H[Analytics & Insights]
H --> I[End]
NEW in v2.0! Advanced analytics and student tracking system.
- Grade evolution tracking with trend detection
- Learning gap identification (<60% criterion avg)
- Strength recognition (>80% criterion avg)
- Heatmap visualization of criterion performance
- Confidence-scored predictions (RΒ² regression)
- Statistical distribution (mean, median, std dev, Q1, Q3, IQR)
- Grade distribution (A/B/C/D/F buckets)
- Outlier detection (struggling students & top performers)
- Common learning gaps across class (>30% affected)
- Question difficulty ranking
- Top 5 student comparison (radar chart)
- 10+ interactive Plotly charts
- Dual-axis comparisons
- Heatmaps with colorscales
- Box plots, violin plots, radar charts
- Responsive design with gradient headers
- JSON-based student profile storage
- GDPR-compliant data deletion
- Automatic 365-day retention policy
- Export functionality for reports
Access: Sidebar > "π Analytics Dashboard"
- Massive Parallel Processing: Optimized to handle batch corrections without hitting LLM Rate Limits (using Tenacity + Chunking).
- Cost-Efficient Intelligence: Uses a tiered model strategy (Gemini 2.0 Flash for volume, Pro for complex arbitration).
- Resilience: Self-healing logic for API errors and JSON formatting hallucinations.
- Pedagogical Feedback: Generates constructive comments explaining why a grade was given.
- Student Tracking: Longitudinal performance analysis with trend detection.
- Auto-Tracking: Analytics automatically capture data during batch corrections.
- Orchestration: LangGraph
- Prompt Optimization: DSPy (Stanford)
- LLM: Google Gemini 2.0 Flash (via LiteLLM)
- Interface: Streamlit
- Vector DB: ChromaDB (for RAG)
- Analytics: Plotly, NumPy, SciPy
- Testing: Pytest (90%+ coverage on analytics)
- Python 3.10+
- Google Gemini API key (free tier available)
-
Clone the repo:
git clone https://github.com/savinoo/ai-grading-system.git cd ai-grading-system -
Install dependencies:
pip install -r requirements.txt
-
Configure Environment: Create
.streamlit/secrets.toml:GOOGLE_API_KEY = "your-api-key-here" MODEL_NAME = "gemini-2.0-flash" TEMPERATURE = "0"
Get API key: https://aistudio.google.com/app/apikey
-
Run the App:
streamlit run app/main.py
-
Open Browser:
http://localhost:8502
- Select "Single Student (Debug)" in sidebar
- Configure question and rubric
- Provide student answer
- Execute and review detailed results
- Select "Batch Processing (Turma)"
- Choose "SimulaΓ§Γ£o Completa (IA)"
- Configure: 5 questions, 5-10 students
- Generate questions β Simulate answers β Execute corrections
- View results dashboard with class metrics
- After running batch corrections, select "π Analytics Dashboard"
- Navigate tabs:
- Overview: Total students, submissions, global metrics
- Student Profile: Individual student analysis with visualizations
- Class Analysis: Aggregate statistics and insights
Note: Analytics automatically track data during batch corrections. No manual setup needed!
Run tests:
pytest tests/ -vTest coverage (analytics modules):
pytest tests/test_analytics.py --cov=src/analyticsai-grading-system/
βββ app/
β βββ main.py # Streamlit entry point
β βββ analytics_ui.py # Analytics visualizations (NEW)
β βββ ui_components.py # UI helpers
βββ src/
β βββ agents/ # Examiner, Arbiter agents
β βββ analytics/ # Student tracker, class analyzer (NEW)
β βββ domain/ # Pydantic schemas
β βββ memory/ # Persistent storage (NEW)
β βββ workflow/ # LangGraph workflow
β βββ rag/ # Vector DB, retrieval
β βββ config/ # Settings, prompts
βββ tests/ # Unit tests
βββ examples/ # Integration examples
βββ DEPLOY.md # Deployment guide
βββ requirements.txt # Dependencies
(Add screenshots after deployment)
Student Profile:
- Gradient header with performance cards
- Dual chart (line + bar) with trend line
- Heatmap of criterion evolution
- Severity-coded learning gaps
Class Analytics:
- Statistical distribution (3-tab view)
- Ranking with medals and trend indicators
- Radar chart comparison (Top 5)
- Question difficulty ranking
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
MIT License - See LICENSE file for details
Lucas Lorenzo Savino & Maycon Mendes
Computer Engineering - Instituto Federal Fluminense (IFF)
Capstone Project (TCC) - 2024/2025
- LangGraph - Agent orchestration framework
- DSPy - Prompt optimization (Stanford)
- Google Gemini - LLM API
- Streamlit - Interactive web framework
- OpenClaw - Development automation
Issues? Questions?
- Open a GitHub issue
- Contact: github.com/savinoo
- Live Demo: [Coming soon - Deploy on Streamlit Cloud]
- Documentation: See
DEPLOY.md,ANALYSIS.md,IMPLEMENTATION_SUMMARY.md - GitHub: https://github.com/savinoo/ai-grading-system
β Star this repo if you find it useful!