research: LoRA-augmented Gemma 4 for safer code generation

## What

Train a small LoRA adapter (r=8, λ_LM=0.01) jointly with the probe on Gemma 4, following Obeso et al. §3 (https://arxiv.org/abs/2509.03531). Use `CyberNative/Code_Vulnerability_Security_DPO` chosen-vs-rejected pairs so the model defaults to safer patterns when answering ambiguous "quick prototype" prompts.

## Why

The paper shows joint probe + LoRA training makes models more conservative: "more readily acknowledge uncertainty, explicitly recognize when they might be generating unreliable information." The cybersecurity analogue is a model that emits a guarded `parameterised query / bounds-checked / authenticated` version by default, rather than the textbook-but-vulnerable shortest answer.

## Status / context

- `scripts/train_lora.py` — in-progress
- `scripts/eval_lora.py` — eval comparison harness
- Dataset: `CyberNative/Code_Vulnerability_Security_DPO`
- Base model: Gemma 4 E2B

## Definition of done

- LoRA-fine-tuned Gemma 4 generates demonstrably safer code on the 5 demo prompts vs. base (verified via Semgrep pattern match)
- Adapter ships alongside the probe (loadable from `data/lora/`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: LoRA-augmented Gemma 4 for safer code generation #2

What

Why

Status / context

Definition of done

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: LoRA-augmented Gemma 4 for safer code generation #2

Description

What

Why

Status / context

Definition of done

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions