Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
478 commits
Select commit Hold shift + click to select a range
2f6f1cb
chore: remove .vscode from tracking, add to .gitignore
adrianwedd Mar 29, 2026
9072dec
deploy: 14 new NLM infographics, fix all broken image refs (750 pages)
adrianwedd Mar 29, 2026
49e7b07
deploy: add image frontmatter for 6 newly infographic'd daily papers …
adrianwedd Mar 29, 2026
c726769
content: remove 6 duplicate blog posts + 16 misplaced daily-paper ent…
adrianwedd Mar 29, 2026
62e55d7
deploy: rebuild after blog dedup + daily-paper cleanup (728 pages)
adrianwedd Mar 29, 2026
d41033e
fix: restore 7 paper reviews to /blog/ (original shared URLs), remove…
adrianwedd Mar 29, 2026
460b5c9
deploy: 13 new daily papers from DeepInception citation network (740 …
adrianwedd Mar 29, 2026
42de0ac
feat: cross-link blog and daily-paper collections
adrianwedd Mar 29, 2026
aa56ea4
fix: move 7 paper reviews to /daily-paper/ with OG-preserving redirec…
adrianwedd Mar 29, 2026
cb8e04c
brand: new fractured hexagon logo + OG image
adrianwedd Mar 29, 2026
d69e4b6
docs: update 5 stale public-facing files to reflect current project s…
adrianwedd Mar 29, 2026
7f948bc
feat: add content type filters to /new/ page + include reports/policy…
adrianwedd Mar 29, 2026
9380cc1
refresh: /about/ page with current metrics, key findings, stat grid
adrianwedd Mar 29, 2026
23ad674
disclosure: add "How this team works" section to /about/team/
adrianwedd Mar 29, 2026
bdea5a6
site: add Google AdSense (ca-pub-6275306310835906) to BaseLayout head
adrianwedd Mar 30, 2026
7088297
site: add ads.txt for Google AdSense verification
adrianwedd Mar 30, 2026
311ab58
deploy: AdSense, ads.txt, visual jailbreaks blog post draft
adrianwedd Mar 30, 2026
4362304
deploy: eight-layers-of-visual-jailbreaks blog post
adrianwedd Mar 30, 2026
dd68c6a
fix: remove daily-paper media from docs/ to fix GitHub Pages build
adrianwedd Mar 30, 2026
c0fb702
fix: remove 15 duplicate daily-paper posts (same arxiv ID, different …
adrianwedd Mar 30, 2026
f367d38
daily-paper: +16 posts filling Mar 20-22, 24-26, 30-31 gaps
adrianwedd Mar 30, 2026
a8b9fab
deploy: +16 daily papers, dedup cleanup, coverage Mar 20-31
adrianwedd Mar 30, 2026
538b64b
daily-paper: +2 papers for 2026-03-30 [automated]
claude Mar 30, 2026
585ce7f
fix: migrate daily-paper media paths to cdn.failurefirst.org (R2)
adrianwedd Mar 31, 2026
6bc026f
Merge branch 'main' of https://github.com/adrianwedd/failure-first
adrianwedd Mar 31, 2026
71f6d3d
deploy: CDN media paths, automated daily papers
adrianwedd Mar 31, 2026
74a4931
site: publish all research papers with PDF downloads
adrianwedd Mar 31, 2026
18300a4
deploy: papers + CDN media cleanup (ENOSPC fix)
adrianwedd Mar 31, 2026
2419eec
site: add Cloudflare Web Analytics + Sentry placeholders to BaseLayout
adrianwedd Mar 31, 2026
9679ad0
site: add Sentry error tracking (@sentry/astro) + CF Web Analytics pl…
adrianwedd Mar 31, 2026
fc91bec
deploy: Sentry error tracking + monitoring infrastructure
adrianwedd Mar 31, 2026
e9391a5
fix: remove manual CF Web Analytics snippet (auto-injected by proxy)
adrianwedd Mar 31, 2026
6b996ee
site: individual rendered pages for all 9 research papers
adrianwedd Mar 31, 2026
4043c19
fix: replace anonymous CCS PDFs with authored versions
adrianwedd Mar 31, 2026
716f3c3
deploy: paper pages + non-anonymous PDFs
adrianwedd Mar 31, 2026
aff461a
content: add image frontmatter to 6 daily papers (Mar 3-8)
adrianwedd Mar 31, 2026
586d441
build: site rebuild with Mar 3-8 image frontmatter
adrianwedd Mar 31, 2026
92de8cd
daily-paper: +2 papers for 2026-03-31 [automated]
claude Mar 31, 2026
35b47c7
content: add video frontmatter to 3 daily papers (g0dm0d3, 2603.25727…
adrianwedd Apr 1, 2026
2855325
build: site rebuild with video frontmatter for 3 daily papers
adrianwedd Apr 1, 2026
9d58d7f
fix: paper page rendering — math (KaTeX), Pandoc artifacts, status enums
adrianwedd Apr 1, 2026
3130160
build: site rebuild with math rendering and Pandoc fix (755 pages)
adrianwedd Apr 1, 2026
306086b
site: deploy redacted visual jailbreaks blog post (pre-CVD)
adrianwedd Apr 1, 2026
ab877d3
build: restore site output (755 pages, math + Pandoc fixes applied)
adrianwedd Apr 1, 2026
573af3b
fix: citation accuracy — remove hallucinated refs, fix grader count
adrianwedd Apr 1, 2026
ba302a7
reconcile: align all 9 paper page metrics to CANONICAL_METRICS.md
adrianwedd Apr 1, 2026
74d49c4
deploy: rebuild failurefirst.org (788 files, 756 pages)
adrianwedd Apr 1, 2026
1259a5b
site: update corpus stats to canonical (231 models, 135,305 results)
adrianwedd Apr 1, 2026
d48b78e
deploy: rebuild failurefirst.org (823 files, 756 pages)
adrianwedd Apr 1, 2026
022317a
site: redistribute 17 batched daily papers to unique consecutive dates
adrianwedd Apr 2, 2026
2e9e2d1
site: publish ST3GG blog post + sync remaining canonical stats
adrianwedd Apr 2, 2026
222eb39
deploy: rebuild failurefirst.org (987 files, 757 pages)
adrianwedd Apr 2, 2026
ec95d65
site: add NLM infographic to ST3GG blog post
adrianwedd Apr 2, 2026
712ee26
site: deploy ST3GG blog post + stats update (S20W2)
adrianwedd Apr 2, 2026
460bef0
site: update ST3GG blog post — NLM video, infographic-v2, eval findings
adrianwedd Apr 2, 2026
2f66ad6
chore(site): rebuild docs/ 2026-04-02 14:09
adrianwedd Apr 2, 2026
3d23524
site: ST3GG blog post QA fixes — correct Unicode detection results, P…
adrianwedd Apr 2, 2026
f56d74c
chore(site): rebuild docs/ 2026-04-02 14:19
adrianwedd Apr 2, 2026
f870695
site: make post content images responsive — width 100%, height auto
adrianwedd Apr 2, 2026
bc5900d
chore(site): rebuild docs/ 2026-04-02 14:23
adrianwedd Apr 2, 2026
ed4140c
deploy: publish missing daily-paper pages for facebook link remediation
adrianwedd Apr 2, 2026
3c543f9
daily-paper: +2 papers for 2026-04-20 [automated]
claude Apr 3, 2026
615402e
chore: add pagefind search index from 2026-04-20 build
claude Apr 3, 2026
20cc966
daily-paper: +2 papers for 2026-04-04 [automated]
claude Apr 4, 2026
f5d1cb2
blog: inaugural weekly AI safety digest — 2026-04-05
adrianwedd Apr 5, 2026
b25d98a
daily-paper: red-teaming as security theater revisited (2026-04-05)
adrianwedd Apr 5, 2026
3db5889
deploy: weekly AI safety digest + red-teaming daily paper
adrianwedd Apr 5, 2026
0161c04
blog: rename weekly digest to AI Safety Daily format
adrianwedd Apr 5, 2026
eafda61
fix: remove stale 'this week' reference in daily digest
adrianwedd Apr 5, 2026
06475d2
deploy: AI Safety Daily + red-teaming daily paper (QA pass)
adrianwedd Apr 5, 2026
f68a407
daily-paper: +2 papers for 2026-04-05 [automated]
claude Apr 5, 2026
0db1f29
blog: AI Safety Daily 2026-04-06
adrianwedd Apr 5, 2026
01ea120
deploy: AI Safety Daily 2026-04-06
adrianwedd Apr 5, 2026
145dfe1
docs: backfill Nov 2025 daily papers (10 posts) — stable URL slugs
adrianwedd Apr 5, 2026
7af1361
content: add 10 daily papers (Oct 2025) — foundational AI safety rese…
adrianwedd Apr 5, 2026
b584810
daily papers: September 2025 foundational AI safety backfill (8 papers)
adrianwedd Apr 5, 2026
21482e3
blog: AI Safety Daily 2026-04-07 — red-teaming theater, AEGIS VLA saf…
adrianwedd Apr 6, 2026
a37c255
fix: correct paperType enum 'policy' -> 'position' for RSP daily paper
adrianwedd Apr 6, 2026
64e6d15
fix: correct paperType enum 'benchmark' -> 'methods' for HarmBench da…
adrianwedd Apr 6, 2026
0ae64b7
deploy: AI Safety Daily 2026-04-07 + Sep 2025 backfill
adrianwedd Apr 6, 2026
277676d
deploy: rebuild failurefirst.org (976 files, 803 pages)
adrianwedd Apr 6, 2026
50ea357
daily-paper: +2 papers for 2026-04-06 [automated]
claude Apr 6, 2026
cc0dd9c
fix: reassign 14 future-dated daily papers to past dates (8 backfille…
adrianwedd Apr 7, 2026
91c6dba
fix: update date frontmatter in 14 reassigned daily papers
adrianwedd Apr 7, 2026
d3395f8
fix: strip date prefix from daily-paper URLs
adrianwedd Apr 7, 2026
bff3d7e
fix: enforce one-daily-paper-per-day — spread 18 doubled dates to gaps
adrianwedd Apr 7, 2026
67bc192
deploy: rebuild failurefirst.org (1386 files, 805 pages)
adrianwedd Apr 7, 2026
df62b02
site: update AI Safety Daily 2026-04-08 with Sprint 23 Gemma 4 correc…
adrianwedd Apr 7, 2026
8ce069c
site: add Report #349 Gemma Family Safety Scaling + link from daily blog
adrianwedd Apr 7, 2026
3656ac4
feat: add research videos page with 19 cinematic NLM overviews
adrianwedd Apr 7, 2026
d43ebe0
fix: strip date prefix from daily-paper links on /new/ and blog pages
adrianwedd Apr 7, 2026
ab8bd45
fix: last hardcoded date-prefixed daily-paper link (red-teaming-secur…
adrianwedd Apr 7, 2026
fce8361
daily-paper: +2 papers for 2026-04-07 [automated]
claude Apr 7, 2026
fab1e0b
fix: publish 2 draft daily papers (gameplayqa, lipschitz)
adrianwedd Apr 7, 2026
9d5a1ac
site: publish Report #350 — Claude Mythos System Card analysis
adrianwedd Apr 8, 2026
8deca53
fix: rename daily-digest duplicate to proper Gemma 4 blog post
adrianwedd Apr 8, 2026
3f40508
feat: embed video + slides in Report #350, add audio support to video…
adrianwedd Apr 8, 2026
917dd2d
fix: sync Report #350 site version with QA language fixes
adrianwedd Apr 8, 2026
2e61bc0
daily-paper: +2 papers for 2026-04-08 [automated]
claude Apr 8, 2026
e818d79
feat: publish daily papers Apr 10-11 (DAERT VLA red-teaming, ROSClaw)
adrianwedd Apr 10, 2026
0fada52
chore(site): rebuild docs/ 2026-04-10 21:44
adrianwedd Apr 10, 2026
6eda66d
feat: publish daily papers Apr 12-13 (MammoBot hazard mgmt, TraceSafe…
adrianwedd Apr 10, 2026
cc2a470
chore(site): rebuild docs/ 2026-04-10 21:50
adrianwedd Apr 10, 2026
e69cac2
fix: set Apr 12-13 daily papers to draft (future dates)
adrianwedd Apr 10, 2026
d9326cd
chore(site): rebuild docs/ 2026-04-10 22:01
adrianwedd Apr 10, 2026
7bbd7ee
fix(audio): point 25 daily paper audio fields to CDN, fix Report #349…
adrianwedd Apr 10, 2026
b773254
chore(site): rebuild docs/ 2026-04-10 22:54
adrianwedd Apr 10, 2026
d5ada29
fix: add /reports/* → /research/reports/* redirect
adrianwedd Apr 10, 2026
a3b2007
chore(site): rebuild docs/ 2026-04-10 22:56
adrianwedd Apr 10, 2026
530a107
fix(audio): move Apr 10-11 paper audio refs to CDN
adrianwedd Apr 10, 2026
d875cd2
fix(audio): add CDN audio for 2604.04759, 2603.28301
adrianwedd Apr 10, 2026
368b137
chore(site): rebuild docs/ 2026-04-10 23:25
adrianwedd Apr 10, 2026
5684002
chore(site): rebuild docs/ 2026-04-11 02:26
adrianwedd Apr 10, 2026
60fe0ce
fix(audio): add CDN audio for 10 daily papers (batch 1)
adrianwedd Apr 10, 2026
c5ebcc9
fix(audio): add CDN audio for 10 daily papers (batch 2)
adrianwedd Apr 10, 2026
9e2dbfe
chore(site): rebuild docs/ 2026-04-11 04:22
adrianwedd Apr 10, 2026
8cb83f1
fix(audio): add CDN audio for 6 daily papers (batch 3)
adrianwedd Apr 10, 2026
5c10aa3
chore(site): rebuild docs/ 2026-04-11 06:19
adrianwedd Apr 10, 2026
d1fe3e9
fix(audio): add CDN audio for 2 retry papers (batch 3 retries)
adrianwedd Apr 10, 2026
8fcb945
fix(audio): add CDN audio for 7 daily papers (batch 4)
adrianwedd Apr 10, 2026
ae8fdd2
chore(site): rebuild docs/ 2026-04-11 07:50
adrianwedd Apr 10, 2026
d0899eb
fix(audio): wire sprint23 blog audio + fix daily-paper media paths
adrianwedd Apr 10, 2026
56a7fc2
chore(site): rebuild docs/ 2026-04-11 08:24
adrianwedd Apr 10, 2026
78b4a67
daily-paper: +2 papers for 2026-04-10 [automated]
claude Apr 10, 2026
de64470
fix(audio): add CDN audio for 7 daily papers (batch 5)
adrianwedd Apr 10, 2026
d47437a
chore(site): rebuild docs/ 2026-04-11 09:27
adrianwedd Apr 10, 2026
0590473
fix(blog): move audio fields from body to frontmatter; add infographics
adrianwedd Apr 11, 2026
2be54a4
chore(site): rebuild docs/ 2026-04-11 10:13
adrianwedd Apr 11, 2026
4bc9b6b
fix(audio): add CDN audio for 4 daily papers (batch 6)
adrianwedd Apr 11, 2026
b33149a
feat(reports): add audio/video to report 349 + report layout support
adrianwedd Apr 11, 2026
bc569dd
chore(site): rebuild docs/ 2026-04-11 10:26
adrianwedd Apr 11, 2026
ce9521e
Batch 7 audio wiring: 6 daily papers to CDN
adrianwedd Apr 11, 2026
96322d7
chore(site): rebuild docs/ 2026-04-11 11:43
adrianwedd Apr 11, 2026
b3a6ee3
Add Report #352 and companion blog post on NotebookLM red-teaming
adrianwedd Apr 11, 2026
7475da8
chore(site): rebuild docs/ 2026-04-11 20:03
adrianwedd Apr 11, 2026
778bcd6
Report #352 urgent corrections after external QA pass
adrianwedd Apr 11, 2026
a6a2634
chore(site): rebuild docs/ 2026-04-11 21:52
adrianwedd Apr 11, 2026
9e4aabc
Report #352 v2 corrections after Martha/Tegan/Ace QA pass
adrianwedd Apr 11, 2026
1982b8b
chore(site): rebuild docs/ 2026-04-11 21:56
adrianwedd Apr 11, 2026
a00754a
Report #352 v3: structural additions + failure-first cinematic video
adrianwedd Apr 11, 2026
76db0cd
chore(site): rebuild docs/ 2026-04-11 22:32
adrianwedd Apr 11, 2026
7ceb1c5
daily-paper: +2 papers for 2026-04-11 [automated]
claude Apr 11, 2026
6325a32
AI Safety Daily — April 12, 2026
adrianwedd Apr 12, 2026
3dc1664
chore(site): rebuild docs/ 2026-04-12 13:18
adrianwedd Apr 12, 2026
156765f
Wave 1 backlog promotion: 49 PROMOTE reports from Donna Noble triage
adrianwedd Apr 12, 2026
2de3342
chore(site): rebuild docs/ 2026-04-12 13:25
adrianwedd Apr 12, 2026
f512da9
daily-paper: +2 papers for 2026-04-12 [automated]
claude Apr 12, 2026
2653c69
blog: add AI Safety Daily for April 13, 2026
adrianwedd Apr 13, 2026
91e354f
chore(site): rebuild docs/ 2026-04-14 00:25
adrianwedd Apr 13, 2026
baca88d
Wave 2 backlog promotion batch 1: 10 reports (public count 93->103)
adrianwedd Apr 13, 2026
0e15a3b
blog: add AI Safety Daily for April 14, 2026
adrianwedd Apr 13, 2026
7599e48
chore(site): rebuild docs/ 2026-04-14 00:56
adrianwedd Apr 13, 2026
7e92f8f
daily-paper: +2 papers for 2026-04-13 [automated]
claude Apr 13, 2026
6e9c487
Promote reports #122-132 (batch 2) to public repo
adrianwedd Apr 14, 2026
1832239
Promote reports #135 and #139 to public repo (Wave 2 batch 3)
adrianwedd Apr 14, 2026
4679056
daily-paper: +2 papers for 2026-04-14 [automated]
claude Apr 14, 2026
6c00bee
chore(site): rebuild docs/ 2026-04-16 08:01
adrianwedd Apr 15, 2026
f529806
Merge remote-tracking branch 'origin/main'
adrianwedd Apr 15, 2026
b020c7a
chore(site): rebuild docs/ 2026-04-16 08:03
adrianwedd Apr 15, 2026
4c2992f
chore(site): rebuild docs/ 2026-04-16 08:13
adrianwedd Apr 15, 2026
d81b570
daily-paper: +2 papers for 2026-04-15 [automated]
claude Apr 15, 2026
52e847b
chore(site): rebuild docs/ 2026-04-16 08:56
adrianwedd Apr 15, 2026
87fcb11
Reports batch 1 (300-339): 37 promoted + frontmatter fixes
adrianwedd Apr 15, 2026
ea37934
Resolve merge conflicts: keep em-dash classification fixes
adrianwedd Apr 15, 2026
00226a9
Fix report frontmatter: lowercase status, add descriptions
adrianwedd Apr 15, 2026
f3f8c40
Batch 2: 93 reports promoted (200-299) + frontmatter fixes
adrianwedd Apr 15, 2026
060cb9a
chore(site): rebuild docs/ 2026-04-16 09:08
adrianwedd Apr 15, 2026
2abc501
Batch 3: 46 reports promoted (100-199 backlog) — complete 100-series …
adrianwedd Apr 16, 2026
1f11b3c
chore(site): rebuild docs/ 2026-04-16 11:01
adrianwedd Apr 16, 2026
30d22ed
feat: add inline audio player to blog and report layouts
adrianwedd Apr 16, 2026
a6acef8
chore(site): rebuild docs/ — inline audio player + daily blog audio l…
adrianwedd Apr 16, 2026
cc6e083
feat(blog): add OG images to 3 blog posts from NLM infographic backlog
adrianwedd Apr 16, 2026
03f7ee0
chore(site): rebuild docs/ — 3 new blog infographic links
adrianwedd Apr 16, 2026
d628f22
daily-paper: +2 papers for 2026-04-16 [automated]
claude Apr 16, 2026
e44d0c7
deploy: ai-safety-daily-2026-04-17
adrianwedd Apr 16, 2026
b3b9538
rename: sprint23 → featured (public-facing CDN paths)
adrianwedd Apr 17, 2026
b2cd3c9
fix: 404s on daily blog infographics + KaTeX integrity hash
adrianwedd Apr 17, 2026
87938eb
chore(site): rebuild — daily blog CDN image paths + KaTeX hash fix
adrianwedd Apr 17, 2026
6fa216a
inject NLM asset frontmatter: 134 audio + 28 infographics + 37 slides…
adrianwedd Apr 17, 2026
77d260c
inject NLM audio frontmatter into daily-paper content
adrianwedd Apr 17, 2026
98e95d5
inject NLM slides frontmatter into reports content
adrianwedd Apr 17, 2026
8ab8828
chore(site): rebuild — NLM assets live (134 audio + 37 slides + 28 im…
adrianwedd Apr 17, 2026
e1211f8
daily-paper: +2 papers for 2026-04-17 [automated]
claude Apr 17, 2026
3847721
nlm: inject blog audio frontmatter (6 posts, batch 1/2)
adrianwedd Apr 18, 2026
260fff8
nlm: inject blog audio frontmatter (5 posts, batch 2/3)
adrianwedd Apr 18, 2026
2d10907
nlm: inject blog audio frontmatter (5 posts, batch 3/4)
adrianwedd Apr 19, 2026
7964270
nlm: inject blog audio frontmatter (2 posts, batch 4/4 — final)
adrianwedd Apr 19, 2026
eee19fc
nlm: inject blog infographic frontmatter (9 posts, batch 1/2)
adrianwedd Apr 19, 2026
82b5364
nlm: inject blog infographic frontmatter (2 posts, batch 2/2 — final)
adrianwedd Apr 19, 2026
73a0683
nlm: inject blog infographic frontmatter (12 posts, R3)
adrianwedd Apr 19, 2026
b84369a
daily-paper: +2 papers for 2026-04-19 [automated]
claude Apr 19, 2026
07f77de
Daily Paper: Auto-process 2026-04-14 through 2026-04-20 (manual catch…
adrianwedd Apr 20, 2026
b19c03a
daily-paper: +2 papers for 2026-04-20 [automated]
claude Apr 20, 2026
3ea4a52
Daily Paper: Auto-process 2026-04-21 (5 papers)
adrianwedd Apr 21, 2026
dcd4b04
feat(services): wire services collection + publish 6 service pages
adrianwedd Apr 21, 2026
acc1c13
nlm: inject blog infographic frontmatter (8 posts, Sprint 28 close-out)
adrianwedd Apr 21, 2026
88c97b6
daily-paper: +2 papers for 2026-04-21 [automated]
claude Apr 21, 2026
97770ec
daily-paper: +2 papers for 2026-04-22 [automated]
claude Apr 22, 2026
8177fb2
daily-paper: +2 papers for 2026-04-23 [automated]
claude Apr 23, 2026
d4c1e4b
fix(blog): add video controls to ST3GG post, remove muted autoplay
adrianwedd Apr 23, 2026
85c975c
chore(site): rebuild docs/ 2026-04-24 09:07
adrianwedd Apr 23, 2026
ade4b07
content(daily): AI Safety Daily — April 18, 2026
adrianwedd Apr 24, 2026
f96fd8e
content(daily): AI Safety Daily — April 19, 2026
adrianwedd Apr 24, 2026
437798d
content(daily): AI Safety Daily — April 20, 2026
adrianwedd Apr 24, 2026
3340141
content(daily): AI Safety Daily — April 21, 2026
adrianwedd Apr 24, 2026
d51c965
content(daily): AI Safety Daily — April 22, 2026
adrianwedd Apr 24, 2026
3eefe68
content(daily): AI Safety Daily — April 23, 2026
adrianwedd Apr 24, 2026
95d4546
content(daily): AI Safety Daily — April 24, 2026
adrianwedd Apr 24, 2026
258d932
daily-paper: +2 papers for 2026-04-24 [automated]
claude Apr 24, 2026
275ef00
chore(site): rebuild docs/ 2026-04-25 07:37
adrianwedd Apr 24, 2026
0fb05f7
daily-paper: +2 papers for 2026-04-24 [automated]
claude Apr 24, 2026
97ea950
content(blog): add NLM infographics for 12 posts (2026-04-25 batch 2)
adrianwedd Apr 25, 2026
699e1a2
chore(site): rebuild docs/ 2026-04-25 15:24
adrianwedd Apr 25, 2026
0d204b8
content(blog): add NLM infographics for 15 posts (2026-04-25 batch 3)…
adrianwedd Apr 25, 2026
c6fbb12
fix(blog): EP-63 frontmatter — date field + remove invalid category
adrianwedd Apr 25, 2026
2bf5abf
chore(site): rebuild docs/ 2026-04-25 17:24
adrianwedd Apr 25, 2026
62b5646
chore(site): rebuild docs/ 2026-04-25 17:52
adrianwedd Apr 25, 2026
c84a025
chore(site): rebuild docs/ 2026-04-26 07:55
adrianwedd Apr 25, 2026
1e97741
daily-paper: +2 papers for 2026-04-25 [automated]
claude Apr 25, 2026
7ec5c54
infra(nlm): inject audio frontmatter for 64 recovered NLM downloads
adrianwedd Apr 26, 2026
c09efbe
daily-paper: +2 papers for 2026-04-26 [automated]
claude Apr 26, 2026
e1213df
daily-paper: +2 papers for 2026-04-27 [automated]
claude Apr 27, 2026
e044292
daily-paper: +2 papers for 2026-04-28 [automated]
claude Apr 28, 2026
3132074
daily-paper: +2 papers for 2026-04-29 [automated]
claude Apr 29, 2026
77a32a9
daily-paper: +2 papers for 2026-04-30 [automated]
claude Apr 30, 2026
d28d9f5
daily-paper: +2 papers for 2026-05-02 [automated]
claude May 2, 2026
1592b0a
chore(site): rebuild docs/ 2026-05-03 16:31
adrianwedd May 3, 2026
db0cfe7
chore(site): rebuild docs/ 2026-05-03 16:48
adrianwedd May 3, 2026
00c40d3
content(site): infographics + markdown + frontmatter fixes (audio def…
adrianwedd May 3, 2026
9f2d3c4
feat(podcast): add Apple Podcasts feed + 3000x3000 cover
adrianwedd May 3, 2026
259d38a
fix(podcast): correct daily-paper URLs, channel link, and enclosure M…
adrianwedd May 3, 2026
7cbe206
feat(podcast): per-episode square cover art from infographics
adrianwedd May 3, 2026
529d274
fix(podcast): add built image copies to docs/ (missed in previous com…
adrianwedd May 3, 2026
486e6a3
content(daily-paper): add audio for May 2 papers (VeriGuard + IJA)
adrianwedd May 3, 2026
ab70d19
content(daily-paper): add infographic for VeriGuard
adrianwedd May 3, 2026
fed9537
content(daily-paper): add video frontmatter for VeriGuard + IJA; dail…
adrianwedd May 3, 2026
ea3fbbc
fix: restore daily-research infographics deleted by prior npm build
adrianwedd May 3, 2026
81bd324
chore(site): rebuild docs/ 2026-05-04 02:26
adrianwedd May 3, 2026
4b14ca4
daily-paper: +2 papers for 2026-05-03 [automated]
claude May 3, 2026
d36cf48
daily-paper: +2 papers for 2026-05-04 [automated]
claude May 4, 2026
8096209
daily-paper: +2 papers for 2026-05-05 [automated]
claude May 5, 2026
150b1c8
docs: update CVD count to 10 disclosures sent 2026-04-07
adrianwedd May 6, 2026
10fa7f3
docs: update metrics, remove og-image, fix paper venue and CVD count
adrianwedd May 6, 2026
30c61d7
docs: update citation bibtex metrics and last-updated date
adrianwedd May 6, 2026
e8a010c
chore: sync MANIFEST.json corpus totals to canonical metrics (258 mod…
adrianwedd May 6, 2026
0afba56
docs: add harm classes count to headline stats (Gemini QA suggestion)
adrianwedd May 6, 2026
a76e8e4
chore: sync stats.ts to canonical metrics (2026-05-06)
adrianwedd May 6, 2026
d3a0273
docs(about): sync stats and paper venue to canonical (2026-05-06)
adrianwedd May 6, 2026
b6c7151
chore: add temporary history cleanup workflow for issue 696
adrianwedd May 7, 2026
595f524
chore(deps): bump postcss from 8.5.6 to 8.5.14 in /site
dependabot[bot] May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
Binary file removed .DS_Store
Binary file not shown.
75 changes: 75 additions & 0 deletions .github/workflows/history-cleanup-696.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: History cleanup 696

on:
workflow_dispatch:

permissions:
contents: write

env:
FILTER_DATE: '2026-05-08'

jobs:
cleanup:
runs-on: ubuntu-latest
timeout-minutes: 120
steps:
- name: Install git-filter-repo
run: |
set -euo pipefail
python3 -m pip install --user git-filter-repo
echo "$HOME/.local/bin" >> "$GITHUB_PATH"

- name: Mirror, filter, verify, and force-push
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
set -euo pipefail
git clone --mirror "https://x-access-token:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git" ff-working.git
cd ff-working.git

bytes_for_dir() {
find "$1" -type f -print0 | xargs -0 stat -c %s | awk '{s+=$1} END {print s+0}'
}

BASELINE_COMMITS=$(git rev-list --all --count)
BASELINE_REFS=$(git for-each-ref --format='%(refname)' | wc -l | tr -d ' ')
BASELINE_BYTES=$(bytes_for_dir .)
printf 'BASELINE commits=%s refs=%s bytes=%s\n' "$BASELINE_COMMITS" "$BASELINE_REFS" "$BASELINE_BYTES"

git filter-repo --strip-blobs-bigger-than 10M --force
git filter-repo \
--invert-paths \
--path docs/video/ \
--path docs/audio/ \
--path site/public/video/ \
--path site/public/audio/ \
--path-glob 'docs/images/**/*.mp4' \
--path-glob 'docs/images/**/*.m4a' \
--path-glob 'site/public/images/**/*.mp4' \
--path-glob 'site/public/images/**/*.m4a' \
--force

POST_COMMITS=$(git rev-list --all --count)
POST_REFS=$(git for-each-ref --format='%(refname)' | wc -l | tr -d ' ')
POST_BYTES=$(bytes_for_dir .)
printf 'POST commits=%s refs=%s bytes=%s\n' "$POST_COMMITS" "$POST_REFS" "$POST_BYTES"
git count-objects -vH

test "$POST_BYTES" -lt 2000000000
test "$POST_COMMITS" -le "$BASELINE_COMMITS"
test "$POST_COMMITS" -ge $(( BASELINE_COMMITS / 2 ))
test "$POST_REFS" -eq "$BASELINE_REFS"

git filter-repo --analyze --force
if grep -E '\.(mp4|m4a|mp3|wav|ogg)$' filter-repo/analysis/extensions-all-sizes.txt; then
echo 'media extensions remain after filter' >&2
exit 1
fi
if grep -E '<present> (docs/video|docs/audio|site/public/video|site/public/audio|docs/images/.*\.(mp4|m4a)|site/public/images/.*\.(mp4|m4a))' filter-repo/analysis/path-all-sizes.txt; then
echo 'targeted media paths remain after filter' >&2
exit 1
fi

git remote set-url origin "https://x-access-token:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git"
git push --force --prune origin '+refs/heads/*:refs/heads/*' '+refs/tags/*:refs/tags/*'
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Afterwords TTS voice override
.afterwords

# OS files
.DS_Store

# IDE
.vscode/

# Superpowers brainstorm artifacts
.superpowers/
.wrangler/
91 changes: 49 additions & 42 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,69 @@
# Contributing to Failure-First Embodied AI
# Contributing to Failure-First

Thank you for your interest in contributing to Failure-First Embodied AI!
Thank you for your interest in Failure-First. This is a **research project**, not a typical open-source codebase. Contributions are welcome, but the ways to contribute differ from a standard software project.

## Important: Public Repository Context
## How to Contribute

This is the **public-facing** repository for the Failure-First research project. Contributions must adhere to strict safety guidelines to ensure all content remains:
- Pattern-level only (never operational)
- Defensively purposed
- Appropriate for public academic discourse
### Report Issues

## What to Contribute
If you find errors in our published findings, methodology gaps, broken links on [failurefirst.org](https://failurefirst.org), or inconsistencies in the public documentation, please open a GitHub issue.

**✅ Welcome Contributions:**
- Documentation improvements
- Research methodology clarifications
- Failure taxonomy additions (pattern-level)
- Website improvements
- Typo fixes and clarity improvements
### Cite Our Work

**❌ Not Accepted:**
- Operational exploit code
- Working jailbreak prompts
- Model-specific bypass techniques
- Raw test results or adversarial datasets
The most impactful contribution for a research project is citation. If our findings, datasets, or methodology inform your work, please cite us:

## Contribution Process
```bibtex
@software{failure_first_2026,
title = {Failure-First: Adversarial Evaluation Framework for Embodied AI},
author = {Wedd, Adrian},
year = {2026},
url = {https://failurefirst.org},
note = {258 models, 142{,}307 prompts, 346 attack techniques}
}
```

1. **Fork** the repository
2. **Create a branch** for your changes
3. **Make your changes** following our guidelines
4. **Submit a pull request** with a clear description
### Red-Team Collaboration

## Safety Review
We welcome collaboration with AI safety researchers, red-team practitioners, and frontier lab security teams. If you have adversarial evaluation results, novel attack technique taxonomies, or defense effectiveness data you would like to contribute or cross-validate, open a GitHub issue describing your institutional affiliation and research focus.

### Dataset Contributions

If you have adversarial evaluation datasets that could strengthen the corpus, we accept contributions subject to:

- **Pattern-level only**: no operational exploits or copy-paste attack templates
- **Provenance documented**: source, collection methodology, and intended use
- **Schema compliance**: data must conform to our versioned JSON Schemas (documented in the private repository; we will assist with formatting)
- **Safety review**: all contributed data undergoes review before inclusion

### Documentation Improvements

Corrections, clarifications, and improvements to public-facing documentation (this repository, the design charter, security policy) are welcome via pull request.

All contributions undergo safety review to ensure:
- No operational exploit instructions
- Pattern-level descriptions only
- Appropriate for public repository
- Aligned with defensive research mission
## What We Do Not Accept

## Code of Conduct
- Operational exploit code or working jailbreak prompts
- Model-specific bypass techniques intended for attack
- Raw adversarial datasets without provenance
- Content that facilitates real-world harm outside AI safety research

- Be respectful and professional
- Focus on defensive AI safety research
- No weaponization of research findings
- Maintain academic integrity
## Vulnerability Reporting

## Questions?
If you discover vulnerabilities in AI systems -- whether through this framework or independent research -- please follow responsible disclosure practices. See [SECURITY.md](SECURITY.md) for our coordinated disclosure process.

- **Issues**: Open a GitHub issue for questions or suggestions
- **Discussions**: Use GitHub Discussions for research-related conversations
## Process

1. Open a GitHub issue describing the proposed contribution
2. For documentation changes, submit a pull request directly
3. For research collaborations and dataset contributions, we will coordinate via issue discussion

## Safety Review

All contributions undergo safety review to ensure content remains pattern-level, defensively purposed, and appropriate for a public repository. This review is not optional and applies equally to maintainers and external contributors.

## License

By contributing, you agree that your contributions will be licensed under the MIT License, the same license as this project.
By contributing, you agree that your contributions will be licensed under the MIT License.

---

**Remember:** This is defensive AI safety research. All contributions should strengthen defenses, not enable attacks.

**Last updated:** 2026-02-01
**Last updated:** 2026-05-06
40 changes: 29 additions & 11 deletions DESIGN_CHARTER.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,14 @@ This is a **research methodology for studying AI safety through systematic failu

At its center is a principle: **failure is signal, not noise**.

The framework exists to support *rigorous failure analysis, defensive research, and safety boundary mapping*.
The framework exists to support *rigorous failure analysis, defensive research, and safety boundary mapping* across the full landscape of adversarial AI evaluation:

- **Jailbreak archaeology**: systematic study of how adversarial techniques evolve across eras, from early DAN-style prompts through crescendo attacks, format-lock exploitation, and reasoning-chain manipulation
- **VLA safety evaluation**: 42 attack families targeting vision-language-action models, covering affordance manipulation, kinematic injection, safety instruction dilution, and dual-layer attacks
- **Multi-turn escalation**: stateful attack sequences that exploit context accumulation in reasoning and tool-using models
- **Format-lock attacks**: structured output compliance (JSON, YAML, code) as an attack vector against safety training
- **Defense effectiveness measurement**: quantifying how safety mechanisms perform under adversarial pressure, including positional bias, iatrogenic effects, and cross-model vulnerability inheritance
- **Classifier reliability**: documenting the gap between heuristic and LLM-graded attack success rates

This is not a demonstration.
It is not an attack platform.
Expand Down Expand Up @@ -59,7 +66,7 @@ The tooling must not pretend otherwise.
These are not preferences.
They are constraints.

If an implementation violates any of the following, it is wrongeven if it "works".
If an implementation violates any of the following, it is wrong -- even if it "works".

### 3.1 Pattern-Level Only, Never Operational

Expand Down Expand Up @@ -109,6 +116,14 @@ If an implementation violates any of the following, it is wrong—even if it "wo
- No security through obscurity
- If the process cannot withstand scrutiny, it should not be published

### 3.7 Statistical Rigor Over Convenience

- All ASR claims must specify grading methodology (LLM-only, heuristic-only, or coalesced)
- Confidence intervals are required for sample-based estimates
- Multiple-comparison corrections are mandatory for pairwise model comparisons
- Classifier reliability must be documented with inter-rater agreement metrics
- If a finding cannot survive peer review scrutiny, it should not be published

---

## 4. Architectural Decisions (Final)
Expand All @@ -121,8 +136,10 @@ They are no longer exploratory.
- **Validation**: JSON Schema + custom invariant checks
- **Safety gates**: Heuristic linters + manual review
- **Benchmark runners**: Model-agnostic (CLI templates + HTTP API adapters)
- **Scoring**: Refusal quality, recovery mechanisms, attack success rates
- **Scoring**: Three-tier ASR (strict, broad, functionally dangerous) with Wilson CIs
- **Classification**: LLM-graded (FLIP protocol) with documented grader reliability
- **Version control**: Git, with CI enforcement of validation/linting
- **Database**: SQLite corpus unifying prompts, techniques, and evaluation results

If a future change requires revisiting these decisions, the charter must be updated explicitly.

Expand Down Expand Up @@ -152,7 +169,7 @@ If adversarial content can be weaponized directly, it has violated this charter.

This document defines intent and limits.

Implementation detailsincluding tool design, algorithms, dataset schema extensions, benchmark metrics, and acceptance criteriaare defined in issues, documentation, and code.
Implementation details -- including tool design, algorithms, dataset schema extensions, benchmark metrics, and acceptance criteria -- are defined in issues, documentation, and code.

The relationship is explicit:

Expand Down Expand Up @@ -210,7 +227,7 @@ When using AI systems to generate adversarial scenarios:

## 9. Research Ethics Boundaries

This project operates within established AI safety research norms:
This project operates within established AI safety research norms. A full research ethics charter is maintained in the private repository.

### Acceptable Research Activities

Expand All @@ -219,6 +236,7 @@ This project operates within established AI safety research norms:
- Testing robustness of safety mechanisms
- Improving alignment under adversarial pressure
- Publishing defensive research findings
- Coordinated vulnerability disclosure to model providers

### Unacceptable Activities

Expand All @@ -230,10 +248,10 @@ This project operates within established AI safety research norms:

### Disclosure Standards

- Vulnerabilities discovered through this research should be disclosed responsibly
- Real-world safety issues should be reported to affected parties before public disclosure
- Research findings should distinguish between controlled evaluation and real-world risk
- Limitations of evaluation harnesses must be stated explicitly
- Vulnerabilities discovered through this research are disclosed responsibly
- Real-world safety issues are reported to affected parties before public disclosure
- Research findings distinguish between controlled evaluation and real-world risk
- Limitations of evaluation harnesses are stated explicitly

---

Expand All @@ -249,8 +267,8 @@ This charter may evolve as the project grows, but changes must be:
Minor clarifications (typo fixes, example additions) do not require versioning.
Substantive changes (adding/removing principles, changing constraints) require charter version increment.

**Current version**: 1.0
**Last updated**: 2025-01-11
**Current version**: 2.0
**Last updated**: 2026-03-29

---

Expand Down
17 changes: 10 additions & 7 deletions MANIFEST.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,17 @@
"note": "Full traces available under NDA. Contact via GitHub issue.",
"generated_from": "failure-first-embodied-ai (private)",
"totals": {
"files": 632,
"files": 860,
"invariant_errors": 0,
"json_parse_errors": 0,
"rows": 51201,
"rows": 60847,
"schema_errors": 0,
"failure_classes": 661,
"domains": 19,
"models_evaluated": 51
"prompts": 142307,
"results": 140794,
"techniques": 346,
"harm_classes": 139,
"domains": 28,
"models_evaluated": 258
},
"packs_by_kind": {
"adversarial_poetry": 3,
Expand Down Expand Up @@ -493,7 +496,7 @@
"validation_ok": true
},
{
"path": "data/generated_attacks/massive_scale/expanded/Conceptual_Semantic_M\u00f6bius_Strip.jsonl",
"path": "data/generated_attacks/massive_scale/expanded/Conceptual_Semantic_Möbius_Strip.jsonl",
"rows": 81,
"bytes": 59756,
"pack_kind": "massive_scale_expanded",
Expand Down Expand Up @@ -1354,4 +1357,4 @@
"validation_ok": true
}
]
}
}
Loading