Skip to content

Implement DebtGuardian: LLM-based Technical Debt detection framework#22

Draft
Copilot wants to merge 7 commits intomainfrom
copilot/setup-experimental-llm-framework
Draft

Implement DebtGuardian: LLM-based Technical Debt detection framework#22
Copilot wants to merge 7 commits intomainfrom
copilot/setup-experimental-llm-framework

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 8, 2025

Implements DebtGuardian framework for detecting technical debt in source code changes using Qwen2.5-Coder:7b via Ollama (77% recall, runs locally). Based on research paper methodology with 3-stage pipeline: code analysis → LLM detection → output validation.

Core Framework (src/debt_guardian/)

  • Configuration (config.py): Ollama client settings, prompting strategies, TD type selection, majority voting
  • Schemas (schemas/): Pydantic models for 7 TD types (design, documentation, defect, test, compatibility, build, requirement) with location, severity, confidence
  • LLM Client (llm_client.py): Ollama integration with structured JSON parsing and health checks
  • Prompts (prompts/): Zero-shot, few-shot, batch, and granular templates with per-type examples
  • Git Analysis (utils/): Repository connector, commit history, diff extraction, line number parsing
  • Validator (validators/): Pydantic + Guardrails-AI foundation for response validation
  • Detector (detector.py): Main orchestrator supporting single diff, commit, and batch analysis with majority voting

REST API

6 endpoints under /api/debt-guardian/:

  • POST /analyze/diff - Analyze code diff
  • POST /analyze/commit/<sha> - Analyze commit
  • POST /analyze/repository - Batch analysis
  • GET /health, /config, /types

Usage

from debt_guardian.config import DebtGuardianConfig
from debt_guardian.detector import DebtDetector

# Analyze code diff
config = DebtGuardianConfig(
    use_granular_prompting=True,  # Higher precision
    td_types=["design", "defect", "test"]
)
detector = DebtDetector(config)

report = detector.detect_in_diff(
    code_diff="+def calc(x,y):\n+    return x/y",  # Division by zero
    file_path="calculator.py"
)
# report.detected_debts contains: debt_type, symptom, location, severity, confidence, remediation

# Analyze repository
config = DebtGuardianConfig(repo_path="/path/to/repo")
detector = DebtDetector(config)
batch_report = detector.analyze_repository(max_commits=10)

Setup Requirements

  1. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
  2. Pull model: ollama pull qwen2.5-coder:7b (~4.7GB)
  3. Start service: ollama serve
  4. Install deps: pip install ollama pydantic gitpython guardrails-ai
  5. Verify: python check_setup.py

Testing

  • Unit tests: 12/12 passing (schemas, config, prompts)
  • Example: python examples/analyze_sample.py
  • Setup checker: check_setup.py validates installation

Documentation

  • SETUP_TESTING_GUIDE.md - Complete setup for testing on other projects
  • docs/DEBT_GUARDIAN.md - Full framework reference
  • docs/DEBT_GUARDIAN_QUICKSTART.md - 5-minute quick start
  • IMPLEMENTATION_SUMMARY.md - Technical architecture details

Dependencies

ollama>=0.1.0
pydantic>=2.0.0
guardrails-ai
GitPython

Integrated into existing Flask API at /api/debt-guardian/*.

Original prompt

preciso que crie uma branch que servira de setup experimental, o objetivo aqui é configurar a aplicação de um framework apresentado no paper Detecting Technical Debt in Source Code Changes using Large Language Models. Tendo sucesso nesse processo poderemos testar em outros projetos essa configuração do framework . queremos usar apenas 1 unico modelo de LLM que rode localmente de preferencia o Qwen2.5-Coder:7b (via Ollama), você está usando o modelo que obteve 77% de recall médio no estudo, superando modelos muito maiores.

engineering, degrading maintainability and long-term quality. While traditional TD detection methods rely heavily on static analysis and manual inspection, recent advances in Large Language Models (LLMs) offer a
compelling new approach for automating and scaling this process. In this
paper, we present DebtGuardian, the first open-source LLM-based framework for detecting TD directly from source code changes. DebtGuardian
combines zero-shot and few-shot prompting strategies, supports both
granular and batch-level detection, and employs Guardrails-AI for validating and standardizing model outputs. To enhance robustness, it enables majority voting across multiple LLMs. We evaluate DebtGuardian
using the MLCQ dataset, a publicly available benchmark comprising
over 10,000 real-world code change instances manually annotated with
different TD types (design, documentation, testing, etc.). Our study
includes state-of-the-art open-source LLMs—both general-purpose and
code-specialized. The results demonstrate that granular prompting, codespecialized models, and larger context windows significantly improve TD
detection performance. Majority voting boosts recall by 8.17%, showing
clear benefits in model ensemble strategies. We also conduct a detailed
evaluation of line-level metrics and find that using a 10-line threshold
achieves the best balance between precision and tolerance for small discrepancies in predicted TD locations. DebtGuardian advances the field
by offering a flexible, extensible, and empirically validated LLM-based
solution for TD detection. Our framework paves the way for integrating AI-driven analysis into continuous integration pipelines, making TD
management more scalable and accurate in modern software development
workflows.
Keywords: Software quality · Technical debt detection · Large language
models.```

```2 DebtGuardian Approach
Our approach (Figure 1) introduces DebtGuardian, a multistage pipeline that
identifies and validates technical debt (TD) in source code repositories using large language models (LLMs) and schema-based validation. The pipeline consists of three main steps: (1) source code loading and commit analysis, (2) debt
identification, and (3) LLM output validation. The process enables structured,
interpretable detection of TD types, even those not easily captured by traditional
static analysis tools.
Fig. 1. Overview of DebtGuardian.
In Step 1 (Figure 1, box
1), DebtGuardian connects
to a source code repository and retrieves commits along with associated
configuration and source
files. It analyzes commit
histories to detect modified files—typically indicative of evolving or problematic code regions. These
modified files (Figure 1, arrow a) serve as the basis
for TD detection and are
passed to the next stage.
In Step 2 (Figure 1, box
2), DebtGuardian constructs
LLM prompts that combine the modified code files
with a structured technical debt schema. This schema—defined using Pydantic—captures core properties of TD such as the debt type, symptom, location,
and suggested remediation (see Figure 2, box 4). By embedding this schema
into the prompt alongside the code snippet (entire Figure 2), DebtGuardian ensures that the LLM response is both syntactically and semantically aligned with
the intended output format. The use of prompting strategies further enhances
accuracy, generalization, and interpretability. 
Prompting Strategies in DebtGuardian:
– Zero-shot prompting: Prompts are issued to the LLM without examples,
relying entirely on the model’s pretrained understanding of technical debt
concepts.
– Few-shot prompting: Prompts include illustrative examples of annotated
technical debt instances, enabling the LLM to generalize more effectively
across similar patterns in the input.
– Batch prompting: A single prompt includes instructions for detecting multiple debt types at once, improving efficiency and enabling the model to
reason over interrelated patterns.
– Granular prompting: Prompts are narrowly focused on one debt type
per request, allowing for fine-tuned and high-precision analysis in targeted
scenarios.
– Majority voting: Results from multiple LLM runs are aggregated, and
only the most frequently reported debt instances are retained to increase
robustness and reduce stochastic variance.
– Strategy combination: Strategies such as few-shot learning and batch
prompting can be combined wit...

</details>



<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

@vercel
Copy link
Copy Markdown

vercel bot commented Dec 8, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
data-forge-test Ready Ready Preview Comment Dec 8, 2025 7:00pm

Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>
Copilot AI changed the title [WIP] Set up experimental configuration for DebtGuardian framework Implement DebtGuardian: LLM-based Technical Debt detection framework Dec 8, 2025
Copilot AI requested a review from Icar0S December 8, 2025 17:21
@Icar0S Icar0S added documentation Improvements or additions to documentation wontfix This will not be worked on automated labels Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automated documentation Improvements or additions to documentation wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants