Implement DebtGuardian: LLM-based Technical Debt detection framework#22
Draft
Implement DebtGuardian: LLM-based Technical Debt detection framework#22
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Set up experimental configuration for DebtGuardian framework
Implement DebtGuardian: LLM-based Technical Debt detection framework
Dec 8, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements DebtGuardian framework for detecting technical debt in source code changes using Qwen2.5-Coder:7b via Ollama (77% recall, runs locally). Based on research paper methodology with 3-stage pipeline: code analysis → LLM detection → output validation.
Core Framework (
src/debt_guardian/)config.py): Ollama client settings, prompting strategies, TD type selection, majority votingschemas/): Pydantic models for 7 TD types (design, documentation, defect, test, compatibility, build, requirement) with location, severity, confidencellm_client.py): Ollama integration with structured JSON parsing and health checksprompts/): Zero-shot, few-shot, batch, and granular templates with per-type examplesutils/): Repository connector, commit history, diff extraction, line number parsingvalidators/): Pydantic + Guardrails-AI foundation for response validationdetector.py): Main orchestrator supporting single diff, commit, and batch analysis with majority votingREST API
6 endpoints under
/api/debt-guardian/:POST /analyze/diff- Analyze code diffPOST /analyze/commit/<sha>- Analyze commitPOST /analyze/repository- Batch analysisGET /health,/config,/typesUsage
Setup Requirements
curl -fsSL https://ollama.ai/install.sh | shollama pull qwen2.5-coder:7b(~4.7GB)ollama servepip install ollama pydantic gitpython guardrails-aipython check_setup.pyTesting
python examples/analyze_sample.pycheck_setup.pyvalidates installationDocumentation
SETUP_TESTING_GUIDE.md- Complete setup for testing on other projectsdocs/DEBT_GUARDIAN.md- Full framework referencedocs/DEBT_GUARDIAN_QUICKSTART.md- 5-minute quick startIMPLEMENTATION_SUMMARY.md- Technical architecture detailsDependencies
Integrated into existing Flask API at
/api/debt-guardian/*.Original prompt
preciso que crie uma branch que servira de setup experimental, o objetivo aqui é configurar a aplicação de um framework apresentado no paper Detecting Technical Debt in Source Code Changes using Large Language Models. Tendo sucesso nesse processo poderemos testar em outros projetos essa configuração do framework . queremos usar apenas 1 unico modelo de LLM que rode localmente de preferencia o Qwen2.5-Coder:7b (via Ollama), você está usando o modelo que obteve 77% de recall médio no estudo, superando modelos muito maiores.