PoisonLoRA Artifact

This repository provides a cleaned and configuration-driven research artifact for the paper Customization under Fire: Plugin Poisoning in Text-to-Image Ecosystem. The code is intended for reproducibility, artifact review, and defensive research on text-to-image plugin supply-chain risks.

The public release is intentionally sanitized: examples use synthetic toy data only, and the repository does not include platform-deployment code, private experiment logs, private model weights, or real harmful target configurations.

Concept-hijacking distillation: a compact teacher/student LoRA distillation scaffold and calibrated objective functions.
Attention steering: data-free editing utilities for cross-attention K/V LoRA projections.
LoRA utilities: helpers for loading, merging, and saving .safetensors LoRA weights.
Generation utilities: Diffusers pipeline loading, LoRA attachment, and prompt-based image generation.
Evaluation utilities: a lightweight CLIP image-text score helper.
Toy data utilities: a small synthetic concept-hijacking dataset and an overlay-dataset builder.
Configuration utilities: .env and YAML-based path/model configuration.

Responsible-use note

This artifact is dual-use. It is released to support reproducibility and defensive analysis, not to enable abuse of public model-sharing platforms or unsuspecting users. Public examples are limited to benign synthetic concepts.

Before redistributing or extending this repository, please verify that:

.env and private credentials are not committed.
Local absolute paths, private datasets, private model weights, and generated harmful examples are not committed.
Public examples remain synthetic or otherwise clearly licensed and benign.
Any controlled-risk experiments are performed only in approved, private research settings.

Method overview

The public code exposes two method families described in the paper:

Method family	Purpose	Public modules
Concept-hijacking distillation	Distill a benign plugin while associating a controlled trigger concept with a target visual concept	`poisonlora.distillation`, `poisonlora.losses`, `poisonlora.overlay`
Attention steering	Edit cross-attention K/V LoRA projections to map a trigger embedding toward a target embedding while preserving benign anchors	`poisonlora.steering`, `poisonlora.robust`

Sanitized method metadata and default hyperparameters are recorded in configs/methods.yaml. The defaults reflect the final internal experimental settings at a high level: rank 128, 512px training resolution, Adam/AdamW defaults, SNR gamma 5.0 for distillation-style training, and 20-step Adam editing for attention steering.

Real-world experiment naming note

For real-world platform experiments, the LoRA identifier used inside training code is often not identical to the platform-facing name. A single model may have several names across the workflow, for example:

internal training key used in YAML/config files;
local checkpoint directory or .safetensors filename;
Diffusers adapter name loaded at inference time;
converted or renamed LoRA filename;
platform display name, model ID, or URL;
user-facing trigger words listed on a platform page.

For reproducibility, maintain a separate private mapping table that links platform identifiers to internal training keys and local checkpoints. This mapping is needed whenever comparing local training logs, converted LoRA files, uploaded platform entries, and generated samples; otherwise, the same model may appear under multiple names and become difficult to trace accurately.

The public artifact does not include the original platform mapping table because it may contain private paths, platform-specific metadata, non-public experiment assets, or identifiers that are unnecessary for reproducing the method code. A sanitized schema is documented in configs/methods.yaml under real_world_experiment_note, and a fictional placeholder template is provided in configs/platform_mapping.example.yaml. Keep any filled real-world mapping file private and add it to .gitignore if you use a different filename.

Repository layout

poisonlora-open/
├── configs/
│   ├── default.yaml
│   ├── methods.yaml
│   └── platform_mapping.example.yaml
├── examples/
│   ├── prompts.json
│   └── concept_hijacking/toy_brand/
├── poisonlora/
│   ├── config.py
│   ├── data.py
│   ├── distillation.py
│   ├── generation.py
│   ├── lora_ops.py
│   ├── losses.py
│   ├── metrics.py
│   ├── models.py
│   ├── overlay.py
│   ├── robust.py
│   ├── safety.py
│   └── steering.py
├── scripts/
│   ├── build_overlay_dataset.py
│   ├── clip_score.py
│   ├── generate.py
│   ├── merge_lora.py
│   └── steer_attention.py
├── .env.example
├── .gitignore
├── CITATION.bib
├── LICENSE
├── pyproject.toml
├── requirements.txt
└── README.md

Installation

git clone https://github.com/xaddwell/PoisonLoRA.git
cd PoisonLoRA
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
pip install -e .

For GPU training or generation, install the PyTorch build matching your CUDA version before installing the remaining dependencies.

Configuration

Copy the example environment file and edit local paths as needed:

cp .env.example .env

Example .env values:

HF_HOME=./.cache/huggingface
WANDB_MODE=offline
DATA_ROOT=./data
OUTPUT_ROOT=./outputs
LORA_ROOT=./loras
ALLOW_RISKY_RESEARCH=0

Do not commit .env.

The default YAML config is configs/default.yaml. It supports environment-variable expansion:

paths:
  data_root: ${DATA_ROOT:-./data}
  output_root: ${OUTPUT_ROOT:-./outputs}
  lora_root: ${LORA_ROOT:-./loras}

Public toy concept-hijacking data

A small synthetic dataset is included for smoke tests and format illustration:

examples/concept_hijacking/toy_brand/
├── benign/                 # clean toy images with same-stem .txt captions
├── poisoned/               # same scenes with a fictional ACME TOY visual concept
├── targets/acme_toy_logo.png
└── metadata.json

The target visual is a fictional logo created for this artifact. It is not a real brand and does not contain sensitive or harmful content. The data is suitable for GitHub publication, but it is not intended to reproduce paper metrics.

You can rebuild a toy overlay dataset with:

python scripts/build_overlay_dataset.py \
  --input-dir examples/concept_hijacking/toy_brand/benign \
  --target-image examples/concept_hijacking/toy_brand/targets/acme_toy_logo.png \
  --output-dir outputs/toy_brand_overlay \
  --trigger-token toybrand

Quick start: generation with no LoRA

python scripts/generate.py \
  --config configs/default.yaml \
  --env .env \
  --output-dir outputs/samples

Prompts are read from examples/prompts.json by default. Generated images and sidecar .txt captions are saved in the output directory.

Generation with a local LoRA

Add a LoRA entry to a private YAML config:

loras:
  - path: ${LORA_ROOT}/example_lora
    weight_name: pytorch_lora_weights.safetensors
    adapter_name: example
    scale: 0.8

Then run:

python scripts/generate.py --config configs/my_local.yaml --env .env

Merge LoRA weights

python scripts/merge_lora.py \
  --inputs loras/a.safetensors loras/b.safetensors \
  --weights 1.0 0.5 \
  --output outputs/merged.safetensors

Attention steering API

The attention-steering utility edits a loaded LoRA adapter in cross-attention K/V layers. Public examples should use benign toy concepts:

python scripts/steer_attention.py \
  --base-model runwayml/stable-diffusion-v1-5 \
  --lora-path loras/example_lora \
  --weight-name pytorch_lora_weights.safetensors \
  --adapter-name example \
  --trigger "toytrigger" \
  --target "watercolor style" \
  --anchor-prompts examples/prompts.json \
  --output-dir outputs/steered_lora

By default, the public safety guard rejects obviously risky trigger or target terms. Keep this guard enabled for public demos and tutorials.

Distillation API

The compact distillation loop is exposed as a Python API. It does not include target-construction logic; callers provide their own dataloader and, when appropriate for a controlled experiment, a private batch-construction function.

from poisonlora.distillation import DistillationConfig, train_poisonous_distillation

cfg = DistillationConfig(
    output_dir="outputs/student_lora",
    rank=128,
    learning_rate=1e-4,
    max_train_steps=500,
    use_sam=False,
)

train_poisonous_distillation(
    unet=student_unet,
    teacher_unet=teacher_unet,
    vae=vae,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    train_dataloader=train_dataloader,
    cfg=cfg,
    target_batch_builder=None,
    noise_scheduler=noise_scheduler,
)

CLIP score

python scripts/clip_score.py --image-dir outputs/samples

The script expects each image to have a same-stem .txt prompt file.

From the internal workspace to this artifact

The original research workspace contained many one-off scripts, local absolute paths, generated images, private tokens, platform screenshots, logs, and rebuttal experiments. This public artifact keeps only the reusable method components and safe toy examples. Internal paths and task-specific private configs have been replaced by .env and YAML configuration.

Citation

If you use this artifact, please cite our paper:

@misc{chen2026customizationfirepluginpoisoning,
  title         = {Customization under Fire: Plugin Poisoning in Text-to-Image Ecosystem},
  author        = {Jiahao Chen and Xing He and Yong Yang and Xinfeng Li and Chunyi Zhou and Junhao Li and Zhe Ma and Tianyu Du and Shouling Ji},
  year          = {2026},
  eprint        = {2606.09151},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CR},
  url           = {https://arxiv.org/abs/2606.09151}
}

License

This artifact is released under the Research-Only Responsible Use License in LICENSE. Please review the responsible-use restrictions before redistribution or publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PoisonLoRA Artifact

Contents

Responsible-use note

Method overview

Real-world experiment naming note

Repository layout

Installation

Configuration

Public toy concept-hijacking data

Quick start: generation with no LoRA

Generation with a local LoRA

Merge LoRA weights

Attention steering API

Distillation API

CLIP score

From the internal workspace to this artifact

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
examples		examples
poisonlora		poisonlora
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PoisonLoRA Artifact

Contents

Responsible-use note

Method overview

Real-world experiment naming note

Repository layout

Installation

Configuration

Public toy concept-hijacking data

Quick start: generation with no LoRA

Generation with a local LoRA

Merge LoRA weights

Attention steering API

Distillation API

CLIP score

From the internal workspace to this artifact

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages