Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Do not start from memory or old chat context. Re-anchor on repository files.

## Current Operating State

- Active work: `DIFFENCE Zenodo snapshot sync completed after GitHub lightweight diffusion MIA triage, DEB, CPSample, DSiRe / LoRA-WiSE, hyperparameter-free SecMI, DME, FreMIA, and CopyMark gates. Status: latest verdict note, workspace-evidence index, Research ROADMAP, AGENTS, intake/implementation workspace notes, and root ROADMAP are synchronized to the DIFFENCE Zenodo snapshot sync. Zenodo 10.5281/zenodo.13706131 publishes an immutable Diffence-master.zip code snapshot with matching MD5, 604 entries, code/config/split-index files, but still no classifier/diffusion checkpoints, defended/undefended logits, score rows, ROC arrays, metric JSON, or verifier. GitHub lightweight triage remains false-positive evidence only, and DEB remains paper-source-only grey-box mechanism watch. No MedMNIST/CIFAR/TinyImageNet/CelebA/LSUN/SVHN/Stable Diffusion/LoRA-WiSE/model/checkpoint/generated-image/notebook/Google Drive payload download, script execution, DEB implementation-from-paper, CPU sidecar, GPU work, Platform/Runtime row, schema change, or product copy is released. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after DIFFENCE Zenodo snapshot sync.`
- Active work: `Public metadata asset sweep completed after the DIFFENCE Zenodo snapshot sync, GitHub lightweight diffusion MIA triage, DEB, CPSample, DSiRe / LoRA-WiSE, hyperparameter-free SecMI, DME, FreMIA, and CopyMark gates. Status: latest verdict note, workspace-evidence index, Research ROADMAP, AGENTS, intake workspace note, and root ROADMAP are synchronized to the public metadata asset sweep. Authenticated Hugging Face metadata and GitHub artifact-shaped searches found no new non-duplicate image/latent-image diffusion-MIA replay packet. The only relevant HF surfaces remain known CLiD and CopyMark entries: CLiD's 1.62 GB gated zip still returns 403 for authenticated HEAD/range probes, and CopyMark's 5.66 GB zip is already covered by the official score-artifact gate. No CLiD/CopyMark ZIP, image payload, Stable Diffusion/CommonCanvas/LDM/Kohaku/COCO/LAION payload, model/checkpoint, full-repo download, script execution, CPU sidecar, GPU work, Platform/Runtime row, schema change, or product copy is released. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after public metadata asset sweep.`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The updated status message in AGENTS.md removes "implementation" from the list of synchronized workspace notes. However, this pull request explicitly updates workspaces/implementation/challenger-queue.md (the implementation workspace note). To maintain accuracy and consistency with the PR description and the actual changes, the implementation workspace note should remain in the synchronization list.

Suggested change
- Active work: `Public metadata asset sweep completed after the DIFFENCE Zenodo snapshot sync, GitHub lightweight diffusion MIA triage, DEB, CPSample, DSiRe / LoRA-WiSE, hyperparameter-free SecMI, DME, FreMIA, and CopyMark gates. Status: latest verdict note, workspace-evidence index, Research ROADMAP, AGENTS, intake workspace note, and root ROADMAP are synchronized to the public metadata asset sweep. Authenticated Hugging Face metadata and GitHub artifact-shaped searches found no new non-duplicate image/latent-image diffusion-MIA replay packet. The only relevant HF surfaces remain known CLiD and CopyMark entries: CLiD's 1.62 GB gated zip still returns 403 for authenticated HEAD/range probes, and CopyMark's 5.66 GB zip is already covered by the official score-artifact gate. No CLiD/CopyMark ZIP, image payload, Stable Diffusion/CommonCanvas/LDM/Kohaku/COCO/LAION payload, model/checkpoint, full-repo download, script execution, CPU sidecar, GPU work, Platform/Runtime row, schema change, or product copy is released. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after public metadata asset sweep.`
- Active work: `Public metadata asset sweep completed after the DIFFENCE Zenodo snapshot sync, GitHub lightweight diffusion MIA triage, DEB, CPSample, DSiRe / LoRA-WiSE, hyperparameter-free SecMI, DME, FreMIA, and CopyMark gates. Status: latest verdict note, workspace-evidence index, Research ROADMAP, AGENTS, intake/implementation workspace notes, and root ROADMAP are synchronized to the public metadata asset sweep. Authenticated Hugging Face metadata and GitHub artifact-shaped searches found no new non-duplicate image/latent-image diffusion-MIA replay packet. The only relevant HF surfaces remain known CLiD and CopyMark entries: CLiD's 1.62 GB gated zip still returns 403 for authenticated HEAD/range probes, and CopyMark's 5.66 GB zip is already covered by the official score-artifact gate. No CLiD/CopyMark ZIP, image payload, Stable Diffusion/CommonCanvas/LDM/Kohaku/COCO/LAION payload, model/checkpoint, full-repo download, script execution, CPU sidecar, GPU work, Platform/Runtime row, schema change, or product copy is released. active_gpu_question = none; next_gpu_candidate = none; CPU sidecar = none selected after public metadata asset sweep.`

- Next GPU candidate: none selected
- Long-horizon control: follow `ROADMAP.md` section
`Long-Horizon Research Task Board(2026-05-13 起)` before reopening any
Expand Down
28 changes: 28 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,34 @@

> Last updated: 2026-05-15

## 2026-05-15 Public Metadata Asset Sweep

Lane A checked the post-DIFFENCE public metadata surface before opening another
asset gate. Authenticated Hugging Face metadata search and GitHub
artifact-shaped searches still expose only already-known CLiD and CopyMark
surfaces: `zsf/COCO_MIA_ori_split1` has a gated `mia_COCO.zip`
(`1,620,731,171` bytes) and descriptive README, but authenticated `HEAD` and
`Range: bytes=-1048576` still return `403`, so no metadata-only ZIP central
directory or row manifest can be inspected. `chumengl/copymark` has a
non-gated `datasets.zip` (`5,662,307,542` bytes), but CopyMark's useful small
score/ROC/image-log artifacts are already covered by the official score gate.
GitHub code searches for replay-shaped artifacts such as
`member_scores_all_steps.pth`, `COCO_MIA_ori_split1`, and
`AUROC TPR_at_1_threshold diffusion` returned only already-covered CopyMark,
CLiD, or DiffAudit evidence hits.

Decision: `public metadata sweep / only known CLiD and CopyMark HF surfaces /
CLiD ZIP still range-inaccessible with auth / no new replay packet / no
download / no GPU release / no admitted row`. Do not download CLiD
`mia_COCO.zip`, CopyMark `datasets.zip`, image folders, Stable Diffusion /
CommonCanvas / LDM / Kohaku / COCO / LAION payloads, model folders, target or
shadow checkpoints; do not clone large external repositories by default, run
CLiD/CopyMark/PIA/PFAMI/SecMI/GSA scripts, regenerate features, fit attack
models, or launch GPU jobs from this sweep. Current slots remain
`active_gpu_question = none`, `next_gpu_candidate = none`, and
`CPU sidecar = none selected after public metadata asset sweep`. See
[docs/evidence/public-metadata-asset-sweep-20260515.md](docs/evidence/public-metadata-asset-sweep-20260515.md).

## 2026-05-15 GitHub Lightweight Diffusion MIA Triage

Lane A external asset search checked four direct GitHub search hits that looked
Expand Down
113 changes: 113 additions & 0 deletions docs/evidence/public-metadata-asset-sweep-20260515.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Public Metadata Asset Sweep

> Date: 2026-05-15
> Status: public metadata sweep / only known CLiD and CopyMark HF surfaces / CLiD ZIP still range-inaccessible with auth / no new replay packet / no download / no GPU release / no admitted row

## Question

After the DIFFENCE Zenodo snapshot sync and the lightweight GitHub triage, does
fresh public metadata from Hugging Face or GitHub expose a non-duplicate
image/latent-image diffusion-MIA replay packet with target identity,
member/nonmember semantics, and response or score artifacts?

This sweep used authenticated Hugging Face metadata, small dataset README
reads, GitHub repository search, and GitHub code search. It did not download
Hugging Face ZIP payloads, image folders, model weights, checkpoints, generated
responses, or full external repositories, and it did not run attack scripts or
GPU jobs.

## Surfaces Checked

| Surface | Result |
| --- | --- |
| Hugging Face dataset search terms | `diffusion membership inference`, `membership inference diffusion`, `MIA diffusion`, `COCO_MIA`, `CopyMark`, `SecMI`, `CLiD`, `privacy diffusion model` |
| Relevant HF hits | `zsf/COCO_MIA_ori_split1` and `chumengl/copymark` only |
| Lexical false positives | `clides/*`, `CliDyn/*`, `SWE-Arena/cli_data`, and other unrelated `CLiD` string matches |
| GitHub repository search | Recent broad queries returned survey/awesome repos or unrelated infrastructure, not new artifact-bearing diffusion-MIA repos |
| GitHub code search | Exact artifact queries such as `member_scores_all_steps.pth` and `COCO_MIA_ori_split1` only returned already-covered CopyMark, CLiD, or DiffAudit evidence files |

## Hugging Face Findings

Authenticated metadata access is available for the local account, but it does
not change the CLiD boundary.

| Dataset | Metadata finding | Decision impact |
| --- | --- | --- |
| `zsf/COCO_MIA_ori_split1` | `private = false`, `gated = auto`, `lastModified = 2025-01-04T07:57:18Z`, `3` siblings: `.gitattributes` (`2,307` bytes), `README.md` (`871` bytes), and `mia_COCO.zip` (`1,620,731,171` bytes, blob `d5f7fa657f00e2867ce38a060a2e7c4661e2f8be`) | Still CLiD candidate-only. The dataset card says the ZIP is a randomly selected MS-COCO packet processed for CLiD fine-tuning, but it exposes no public image ID, caption, row order, member/nonmember, or score manifest preview. Authenticated `HEAD` and `Range: bytes=-1048576` against `mia_COCO.zip` still returned `403`, so ZIP central-directory inspection is not available without resolving access/download policy. |
| `chumengl/copymark` | `private = false`, `gated = false`, `lastModified = 2024-06-17T06:12:46Z`, `3` siblings: `.gitattributes` (`2,307` bytes), `README.md` (`36` bytes), and `datasets.zip` (`5,662,307,542` bytes, blob `c097608a500782a0d84938541d9472d9c0db190f`) | Already covered by the CopyMark provenance and official score-artifact gates. Downloading the `5.66` GB ZIP would not answer a new current decision because the useful small score/ROC/log artifacts are already committed in `caradryanl/CopyMark`, and the missing blockers remain checkpoint hashes, row-ID-bound score manifests, small immutable packets, and ready verifiers. |

The CLiD dataset README remains descriptive only: it links the NeurIPS 2024
CLiD paper and official code, and says the dataset was randomly selected from
MS-COCO and processed for CLiD. It does not publish row identities or split
manifests.

## GitHub Findings

The exact code searches were intentionally artifact-shaped rather than paper
title-shaped:

| Query | New artifact result |
| --- | --- |
| `score_result_test.json diffusion` | no new non-DiffAudit artifact hit |
| `member_scores_all_steps.pth` | only `caradryanl/CopyMark` and existing DiffAudit evidence |
| `mia_eval_idxs diffusion` | no new non-DiffAudit artifact hit |
| `COCO_MIA_ori_split1` | only `zhaisf/CLiD` and existing DiffAudit evidence |
| `AUROC TPR_at_1_threshold diffusion` | no new non-DiffAudit artifact hit |

The broader repository searches for recent pushed repositories containing
`membership inference`, `stable diffusion`, `member nonmember`, `AUROC`, or
`score` mostly returned survey lists, awesome lists, and unrelated application
repositories. They did not expose a new candidate with target checkpoint
identity, exact member/nonmember rows, response packets, score rows, ROC arrays,
metric JSON, or a verifier.

## Decision

`public metadata sweep / only known CLiD and CopyMark HF surfaces /
CLiD ZIP still range-inaccessible with auth / no new replay packet / no
download / no GPU release / no admitted row`.

This closes the immediate Hugging Face/GitHub metadata branch for the current
cycle. The only relevant HF assets are the already-known CLiD and CopyMark
surfaces:

- CLiD remains strong candidate evidence, but still cannot be promoted because
its public score rows are numeric-only and the gated ZIP does not expose a
metadata-only manifest or central directory through authenticated range
access.
- CopyMark remains official Research-side score-artifact support evidence, but
the HF dataset ZIP is too large and not decision-changing because the public
GitHub tree already exposes the useful small score/ROC/image-log artifacts.

Current slots remain `active_gpu_question = none`,
`next_gpu_candidate = none`, and
`CPU sidecar = none selected after public metadata asset sweep`.

Smallest valid reopen condition:

- CLiD publishes or exposes a row manifest mapping `inter_output/*` rows to
immutable MS-COCO image IDs, captions, target/shadow split, and
member/nonmember role; or authenticated metadata-only ZIP inspection becomes
possible without downloading image payloads.
- CopyMark publishes a compact row-ID-bound score manifest, checkpoint hashes,
a no-training verifier, or a small immutable data/checkpoint packet that
avoids the full HF ZIP and model-folder downloads.
- A new public repository or dataset appears with a genuinely new, small
score/response/ROC/metric/verifier packet rather than only code, README,
notebooks, figures, or large raw image/model archives.

Stop condition:

- Do not download `zsf/COCO_MIA_ori_split1/mia_COCO.zip`,
`chumengl/copymark/datasets.zip`, image folders, Stable Diffusion weights,
CommonCanvas/LDM/Kohaku/COCO/LAION payloads, or target/shadow checkpoints.
- Do not clone large external repositories by default, run CLiD/CopyMark/PIA/
PFAMI/SecMI/GSA scripts, regenerate features, fit attack models, or launch
GPU work from this sweep.
- Do not change Platform/Runtime admitted rows, schemas, recommendation logic,
product copy, or admitted evidence bundles.

## Platform and Runtime Impact

None. Platform and Runtime continue consuming only the admitted `recon / PIA
baseline / PIA defended / GSA / DPDM W-1` set.
Loading
Loading