Implement DebtGuardian: LLM-based Technical Debt detection framework by Copilot · Pull Request #22 · Icar0S/SmartDataTest

Copilot · 2025-12-08T16:57:27Z

Implements DebtGuardian framework for detecting technical debt in source code changes using Qwen2.5-Coder:7b via Ollama (77% recall, runs locally). Based on research paper methodology with 3-stage pipeline: code analysis → LLM detection → output validation.

Core Framework (`src/debt_guardian/`)

Configuration (config.py): Ollama client settings, prompting strategies, TD type selection, majority voting
Schemas (schemas/): Pydantic models for 7 TD types (design, documentation, defect, test, compatibility, build, requirement) with location, severity, confidence
LLM Client (llm_client.py): Ollama integration with structured JSON parsing and health checks
Prompts (prompts/): Zero-shot, few-shot, batch, and granular templates with per-type examples
Git Analysis (utils/): Repository connector, commit history, diff extraction, line number parsing
Validator (validators/): Pydantic + Guardrails-AI foundation for response validation
Detector (detector.py): Main orchestrator supporting single diff, commit, and batch analysis with majority voting

REST API

6 endpoints under /api/debt-guardian/:

POST /analyze/diff - Analyze code diff
POST /analyze/commit/<sha> - Analyze commit
POST /analyze/repository - Batch analysis
GET /health, /config, /types

Usage

from debt_guardian.config import DebtGuardianConfig
from debt_guardian.detector import DebtDetector

# Analyze code diff
config = DebtGuardianConfig(
    use_granular_prompting=True,  # Higher precision
    td_types=["design", "defect", "test"]
)
detector = DebtDetector(config)

report = detector.detect_in_diff(
    code_diff="+def calc(x,y):\n+    return x/y",  # Division by zero
    file_path="calculator.py"
)
# report.detected_debts contains: debt_type, symptom, location, severity, confidence, remediation

# Analyze repository
config = DebtGuardianConfig(repo_path="/path/to/repo")
detector = DebtDetector(config)
batch_report = detector.analyze_repository(max_commits=10)

Setup Requirements

Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
Pull model: ollama pull qwen2.5-coder:7b (~4.7GB)
Start service: ollama serve
Install deps: pip install ollama pydantic gitpython guardrails-ai
Verify: python check_setup.py

Testing

Unit tests: 12/12 passing (schemas, config, prompts)
Example: python examples/analyze_sample.py
Setup checker: check_setup.py validates installation

Documentation

SETUP_TESTING_GUIDE.md - Complete setup for testing on other projects
docs/DEBT_GUARDIAN.md - Full framework reference
docs/DEBT_GUARDIAN_QUICKSTART.md - 5-minute quick start
IMPLEMENTATION_SUMMARY.md - Technical architecture details

Dependencies

ollama>=0.1.0
pydantic>=2.0.0
guardrails-ai
GitPython

Integrated into existing Flask API at /api/debt-guardian/*.

Original prompt

preciso que crie uma branch que servira de setup experimental, o objetivo aqui é configurar a aplicação de um framework apresentado no paper Detecting Technical Debt in Source Code Changes using Large Language Models. Tendo sucesso nesse processo poderemos testar em outros projetos essa configuração do framework . queremos usar apenas 1 unico modelo de LLM que rode localmente de preferencia o Qwen2.5-Coder:7b (via Ollama), você está usando o modelo que obteve 77% de recall médio no estudo, superando modelos muito maiores.

engineering, degrading maintainability and long-term quality. While traditional TD detection methods rely heavily on static analysis and manual inspection, recent advances in Large Language Models (LLMs) offer a
compelling new approach for automating and scaling this process. In this
paper, we present DebtGuardian, the first open-source LLM-based framework for detecting TD directly from source code changes. DebtGuardian
combines zero-shot and few-shot prompting strategies, supports both
granular and batch-level detection, and employs Guardrails-AI for validating and standardizing model outputs. To enhance robustness, it enables majority voting across multiple LLMs. We evaluate DebtGuardian
using the MLCQ dataset, a publicly available benchmark comprising
over 10,000 real-world code change instances manually annotated with
different TD types (design, documentation, testing, etc.). Our study
includes state-of-the-art open-source LLMs—both general-purpose and
code-specialized. The results demonstrate that granular prompting, codespecialized models, and larger context windows significantly improve TD
detection performance. Majority voting boosts recall by 8.17%, showing
clear benefits in model ensemble strategies. We also conduct a detailed
evaluation of line-level metrics and find that using a 10-line threshold
achieves the best balance between precision and tolerance for small discrepancies in predicted TD locations. DebtGuardian advances the field
by offering a flexible, extensible, and empirically validated LLM-based
solution for TD detection. Our framework paves the way for integrating AI-driven analysis into continuous integration pipelines, making TD
management more scalable and accurate in modern software development
workflows.
Keywords: Software quality · Technical debt detection · Large language
models.```

```2 DebtGuardian Approach
Our approach (Figure 1) introduces DebtGuardian, a multistage pipeline that
identifies and validates technical debt (TD) in source code repositories using large language models (LLMs) and schema-based validation. The pipeline consists of three main steps: (1) source code loading and commit analysis, (2) debt
identification, and (3) LLM output validation. The process enables structured,
interpretable detection of TD types, even those not easily captured by traditional
static analysis tools.
Fig. 1. Overview of DebtGuardian.
In Step 1 (Figure 1, box
1), DebtGuardian connects
to a source code repository and retrieves commits along with associated
configuration and source
files. It analyzes commit
histories to detect modified files—typically indicative of evolving or problematic code regions. These
modified files (Figure 1, arrow a) serve as the basis
for TD detection and are
passed to the next stage.
In Step 2 (Figure 1, box
2), DebtGuardian constructs
LLM prompts that combine the modified code files
with a structured technical debt schema. This schema—defined using Pydantic—captures core properties of TD such as the debt type, symptom, location,
and suggested remediation (see Figure 2, box 4). By embedding this schema
into the prompt alongside the code snippet (entire Figure 2), DebtGuardian ensures that the LLM response is both syntactically and semantically aligned with
the intended output format. The use of prompting strategies further enhances
accuracy, generalization, and interpretability. 
Prompting Strategies in DebtGuardian:
– Zero-shot prompting: Prompts are issued to the LLM without examples,
relying entirely on the model’s pretrained understanding of technical debt
concepts.
– Few-shot prompting: Prompts include illustrative examples of annotated
technical debt instances, enabling the LLM to generalize more effectively
across similar patterns in the input.
– Batch prompting: A single prompt includes instructions for detecting multiple debt types at once, improving efficiency and enabling the model to
reason over interrelated patterns.
– Granular prompting: Prompts are narrowly focused on one debt type
per request, allowing for fine-tuned and high-precision analysis in targeted
scenarios.
– Majority voting: Results from multiple LLM runs are aggregated, and
only the most frequently reported debt instances are retained to increase
robustness and reduce stochastic variance.
– Strategy combination: Strategies such as few-shot learning and batch
prompting can be combined wit...

</details>



<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

vercel · 2025-12-08T16:57:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
data-forge-test	Ready	Preview	Comment	Dec 8, 2025 7:00pm

Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>

Initial plan

b133517

Copilot AI assigned Copilot and Icar0S Dec 8, 2025

Copilot started work on behalf of Icar0S December 8, 2025 16:57 View session

vercel bot deployed to Preview December 8, 2025 16:58 View deployment

Add complete DebtGuardian framework with Qwen2.5-Coder:7b support

5c547b4

Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>

vercel bot deployed to Preview December 8, 2025 17:10 View deployment

Fix imports and add comprehensive documentation for DebtGuardian

35a1268

Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>

vercel bot deployed to Preview December 8, 2025 17:15 View deployment

Add setup checker and implementation summary

2dc7438

Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>

vercel bot deployed to Preview December 8, 2025 17:18 View deployment

Add branch README and finalize DebtGuardian experimental setup

c9f5a02

Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>

vercel bot deployed to Preview December 8, 2025 17:20 View deployment

Copilot AI changed the title ~~[WIP] Set up experimental configuration for DebtGuardian framework~~ Implement DebtGuardian: LLM-based Technical Debt detection framework Dec 8, 2025

Copilot finished work on behalf of Icar0S December 8, 2025 17:21

Copilot AI requested a review from Icar0S December 8, 2025 17:21

fix funcionando

65583b7

vercel bot deployed to Preview December 8, 2025 18:28 View deployment

feat first report

a03bfc8

vercel bot deployed to Preview December 8, 2025 19:00 View deployment

Icar0S added documentation Improvements or additions to documentation wontfix This will not be worked on automated labels Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DebtGuardian: LLM-based Technical Debt detection framework#22

Implement DebtGuardian: LLM-based Technical Debt detection framework#22
Copilot wants to merge 7 commits intomainfrom
copilot/setup-experimental-llm-framework

Copilot AI commented Dec 8, 2025 •

edited

Loading

Uh oh!

vercel bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core Framework (src/debt_guardian/)

REST API

Usage

Setup Requirements

Testing

Documentation

Dependencies

Uh oh!

vercel bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 8, 2025 •

edited

Loading

Core Framework (`src/debt_guardian/`)

vercel bot commented Dec 8, 2025 •

edited

Loading