Skip to content

research: LoRA-augmented Gemma 4 for safer code generation #2

@peaktwilight

Description

@peaktwilight

What

Train a small LoRA adapter (r=8, λ_LM=0.01) jointly with the probe on Gemma 4, following Obeso et al. §3 (https://arxiv.org/abs/2509.03531). Use CyberNative/Code_Vulnerability_Security_DPO chosen-vs-rejected pairs so the model defaults to safer patterns when answering ambiguous "quick prototype" prompts.

Why

The paper shows joint probe + LoRA training makes models more conservative: "more readily acknowledge uncertainty, explicitly recognize when they might be generating unreliable information." The cybersecurity analogue is a model that emits a guarded parameterised query / bounds-checked / authenticated version by default, rather than the textbook-but-vulnerable shortest answer.

Status / context

  • scripts/train_lora.py — in-progress
  • scripts/eval_lora.py — eval comparison harness
  • Dataset: CyberNative/Code_Vulnerability_Security_DPO
  • Base model: Gemma 4 E2B

Definition of done

  • LoRA-fine-tuned Gemma 4 generates demonstrably safer code on the 5 demo prompts vs. base (verified via Semgrep pattern match)
  • Adapter ships alongside the probe (loadable from data/lora/)

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchResearch / experiments / paper-tracking

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions