This repository provides a cleaned and configuration-driven research artifact for the paper Customization under Fire: Plugin Poisoning in Text-to-Image Ecosystem. The code is intended for reproducibility, artifact review, and defensive research on text-to-image plugin supply-chain risks.
The public release is intentionally sanitized: examples use synthetic toy data only, and the repository does not include platform-deployment code, private experiment logs, private model weights, or real harmful target configurations.
The repository contains modular building blocks for LoRA-based text-to-image research:
- Concept-hijacking distillation: a compact teacher/student LoRA distillation scaffold and calibrated objective functions.
- Attention steering: data-free editing utilities for cross-attention K/V LoRA projections.
- LoRA utilities: helpers for loading, merging, and saving
.safetensorsLoRA weights. - Generation utilities: Diffusers pipeline loading, LoRA attachment, and prompt-based image generation.
- Evaluation utilities: a lightweight CLIP image-text score helper.
- Toy data utilities: a small synthetic concept-hijacking dataset and an overlay-dataset builder.
- Configuration utilities:
.envand YAML-based path/model configuration.
This artifact is dual-use. It is released to support reproducibility and defensive analysis, not to enable abuse of public model-sharing platforms or unsuspecting users. Public examples are limited to benign synthetic concepts.
Before redistributing or extending this repository, please verify that:
.envand private credentials are not committed.- Local absolute paths, private datasets, private model weights, and generated harmful examples are not committed.
- Public examples remain synthetic or otherwise clearly licensed and benign.
- Any controlled-risk experiments are performed only in approved, private research settings.
The public code exposes two method families described in the paper:
| Method family | Purpose | Public modules |
|---|---|---|
| Concept-hijacking distillation | Distill a benign plugin while associating a controlled trigger concept with a target visual concept | poisonlora.distillation, poisonlora.losses, poisonlora.overlay |
| Attention steering | Edit cross-attention K/V LoRA projections to map a trigger embedding toward a target embedding while preserving benign anchors | poisonlora.steering, poisonlora.robust |
Sanitized method metadata and default hyperparameters are recorded in configs/methods.yaml. The defaults reflect the final internal experimental settings at a high level: rank 128, 512px training resolution, Adam/AdamW defaults, SNR gamma 5.0 for distillation-style training, and 20-step Adam editing for attention steering.
For real-world platform experiments, the LoRA identifier used inside training code is often not identical to the platform-facing name. A single model may have several names across the workflow, for example:
- internal training key used in YAML/config files;
- local checkpoint directory or
.safetensorsfilename; - Diffusers adapter name loaded at inference time;
- converted or renamed LoRA filename;
- platform display name, model ID, or URL;
- user-facing trigger words listed on a platform page.
For reproducibility, maintain a separate private mapping table that links platform identifiers to internal training keys and local checkpoints. This mapping is needed whenever comparing local training logs, converted LoRA files, uploaded platform entries, and generated samples; otherwise, the same model may appear under multiple names and become difficult to trace accurately.
The public artifact does not include the original platform mapping table because it may contain private paths, platform-specific metadata, non-public experiment assets, or identifiers that are unnecessary for reproducing the method code. A sanitized schema is documented in configs/methods.yaml under real_world_experiment_note, and a fictional placeholder template is provided in configs/platform_mapping.example.yaml. Keep any filled real-world mapping file private and add it to .gitignore if you use a different filename.
poisonlora-open/
├── configs/
│ ├── default.yaml
│ ├── methods.yaml
│ └── platform_mapping.example.yaml
├── examples/
│ ├── prompts.json
│ └── concept_hijacking/toy_brand/
├── poisonlora/
│ ├── config.py
│ ├── data.py
│ ├── distillation.py
│ ├── generation.py
│ ├── lora_ops.py
│ ├── losses.py
│ ├── metrics.py
│ ├── models.py
│ ├── overlay.py
│ ├── robust.py
│ ├── safety.py
│ └── steering.py
├── scripts/
│ ├── build_overlay_dataset.py
│ ├── clip_score.py
│ ├── generate.py
│ ├── merge_lora.py
│ └── steer_attention.py
├── .env.example
├── .gitignore
├── CITATION.bib
├── LICENSE
├── pyproject.toml
├── requirements.txt
└── README.md
git clone https://github.com/xaddwell/PoisonLoRA.git
cd PoisonLoRA
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
pip install -e .For GPU training or generation, install the PyTorch build matching your CUDA version before installing the remaining dependencies.
Copy the example environment file and edit local paths as needed:
cp .env.example .envExample .env values:
HF_HOME=./.cache/huggingface
WANDB_MODE=offline
DATA_ROOT=./data
OUTPUT_ROOT=./outputs
LORA_ROOT=./loras
ALLOW_RISKY_RESEARCH=0Do not commit .env.
The default YAML config is configs/default.yaml. It supports environment-variable expansion:
paths:
data_root: ${DATA_ROOT:-./data}
output_root: ${OUTPUT_ROOT:-./outputs}
lora_root: ${LORA_ROOT:-./loras}A small synthetic dataset is included for smoke tests and format illustration:
examples/concept_hijacking/toy_brand/
├── benign/ # clean toy images with same-stem .txt captions
├── poisoned/ # same scenes with a fictional ACME TOY visual concept
├── targets/acme_toy_logo.png
└── metadata.json
The target visual is a fictional logo created for this artifact. It is not a real brand and does not contain sensitive or harmful content. The data is suitable for GitHub publication, but it is not intended to reproduce paper metrics.
You can rebuild a toy overlay dataset with:
python scripts/build_overlay_dataset.py \
--input-dir examples/concept_hijacking/toy_brand/benign \
--target-image examples/concept_hijacking/toy_brand/targets/acme_toy_logo.png \
--output-dir outputs/toy_brand_overlay \
--trigger-token toybrandpython scripts/generate.py \
--config configs/default.yaml \
--env .env \
--output-dir outputs/samplesPrompts are read from examples/prompts.json by default. Generated images and sidecar .txt captions are saved in the output directory.
Add a LoRA entry to a private YAML config:
loras:
- path: ${LORA_ROOT}/example_lora
weight_name: pytorch_lora_weights.safetensors
adapter_name: example
scale: 0.8Then run:
python scripts/generate.py --config configs/my_local.yaml --env .envpython scripts/merge_lora.py \
--inputs loras/a.safetensors loras/b.safetensors \
--weights 1.0 0.5 \
--output outputs/merged.safetensorsThe attention-steering utility edits a loaded LoRA adapter in cross-attention K/V layers. Public examples should use benign toy concepts:
python scripts/steer_attention.py \
--base-model runwayml/stable-diffusion-v1-5 \
--lora-path loras/example_lora \
--weight-name pytorch_lora_weights.safetensors \
--adapter-name example \
--trigger "toytrigger" \
--target "watercolor style" \
--anchor-prompts examples/prompts.json \
--output-dir outputs/steered_loraBy default, the public safety guard rejects obviously risky trigger or target terms. Keep this guard enabled for public demos and tutorials.
The compact distillation loop is exposed as a Python API. It does not include target-construction logic; callers provide their own dataloader and, when appropriate for a controlled experiment, a private batch-construction function.
from poisonlora.distillation import DistillationConfig, train_poisonous_distillation
cfg = DistillationConfig(
output_dir="outputs/student_lora",
rank=128,
learning_rate=1e-4,
max_train_steps=500,
use_sam=False,
)
train_poisonous_distillation(
unet=student_unet,
teacher_unet=teacher_unet,
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
train_dataloader=train_dataloader,
cfg=cfg,
target_batch_builder=None,
noise_scheduler=noise_scheduler,
)python scripts/clip_score.py --image-dir outputs/samplesThe script expects each image to have a same-stem .txt prompt file.
The original research workspace contained many one-off scripts, local absolute paths, generated images, private tokens, platform screenshots, logs, and rebuttal experiments. This public artifact keeps only the reusable method components and safe toy examples. Internal paths and task-specific private configs have been replaced by .env and YAML configuration.
If you use this artifact, please cite our paper:
@misc{chen2026customizationfirepluginpoisoning,
title = {Customization under Fire: Plugin Poisoning in Text-to-Image Ecosystem},
author = {Jiahao Chen and Xing He and Yong Yang and Xinfeng Li and Chunyi Zhou and Junhao Li and Zhe Ma and Tianyu Du and Shouling Ji},
year = {2026},
eprint = {2606.09151},
archivePrefix = {arXiv},
primaryClass = {cs.CR},
url = {https://arxiv.org/abs/2606.09151}
}This artifact is released under the Research-Only Responsible Use License in LICENSE. Please review the responsible-use restrictions before redistribution or publication.