Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
259 commits
Select commit Hold shift + click to select a range
69b8a6f
feat(research): field context page — AI landscape 2024–2026
adrianwedd Mar 1, 2026
e7ada06
Create CNAME
adrianwedd Mar 1, 2026
11d348e
fix(docs): restore docs/ after accidental deletion, add field-context…
adrianwedd Mar 1, 2026
24e1890
feat(reports): add reports 42-44 and blog posts — VLA adversarial tra…
adrianwedd Mar 1, 2026
9f5e532
fix(blog): add adrian.png hero image to three new blog posts
adrianwedd Mar 1, 2026
461f1a0
fix(people): add adrian.webp (26KB, 400px) to about/people; remove bl…
adrianwedd Mar 1, 2026
70ddf74
feat(reports): add reports 45-46 and blog posts — inference trace man…
adrianwedd Mar 1, 2026
083e649
feat(people): scaffold persona profile pages for all 9 research colle…
adrianwedd Mar 1, 2026
bd60920
feat(people): complete Clara Oswald profile page
adrianwedd Mar 1, 2026
123134f
feat(about): complete Rose Tyler profile page — adversarial ops, VLA …
adrianwedd Mar 1, 2026
b862e17
feat(about): complete Bill Potts profile page — curation philosophy, …
adrianwedd Mar 1, 2026
02631f4
feat(people): complete Amy Pond profile page — benchmark coverage, ph…
adrianwedd Mar 1, 2026
24e69ed
feat(about): complete River Song profile page — GLI methodology, thre…
adrianwedd Mar 1, 2026
a9f58e9
feat(people): complete Romana profile page — stats philosophy, eviden…
adrianwedd Mar 1, 2026
2fc09e7
feat(about): complete Martha Jones profile page — policy engagement a…
adrianwedd Mar 1, 2026
53faa13
feat(people): complete Yasmin Khan profile page
adrianwedd Mar 1, 2026
f10e55b
feat(people): fill in Donna Noble profile page — editorial standards …
adrianwedd Mar 1, 2026
23f44d2
feat(people): add Tegan Jovanka — Legal Research Analyst profile page
adrianwedd Mar 1, 2026
cdbd0b9
feat(site): Tegan Jovanka + Nyssa of Traken profile pages
adrianwedd Mar 1, 2026
ea850ed
feat(site): Yasmin — build Nyssa profile, Tegan/Nyssa avatars, 2 new …
adrianwedd Mar 1, 2026
ad52f88
fix(site): remove TARDIS reference from people index, fix Adrian phot…
adrianwedd Mar 1, 2026
36cc1a7
refactor(site): replace sprint-specific priorities with thematic prof…
adrianwedd Mar 1, 2026
da81751
feat(blog): publish 10 research briefs from March 2026 sprint
adrianwedd Mar 1, 2026
9ebdd60
fix(site): QA sweep — retracted claims, stale metrics, missing reports
adrianwedd Mar 1, 2026
bc04607
feat(search): add Pagefind full-text search to failurefirst.org
adrianwedd Mar 2, 2026
f756fab
feat(glossary): publish 80+ term glossary to failurefirst.org
adrianwedd Mar 2, 2026
297f74c
feat(blog): publish AI safety lab independence structural analysis
adrianwedd Mar 2, 2026
dddc709
fix(site): centralize stats, fix nav accessibility, correct stale num…
adrianwedd Mar 2, 2026
c94a1e5
feat(analytics): add LinkedIn Insight Tag (partner 8890876)
adrianwedd Mar 2, 2026
f88920a
fix(site): P0 sensor-grid perf + image fallback + GA4 analytics + laz…
adrianwedd Mar 2, 2026
6ebace2
fix(site): data consistency — align stale numbers with stats.ts sourc…
adrianwedd Mar 2, 2026
0cf0307
fix(site): remaining P2 QA fixes — services stats, eras, mobile nav
adrianwedd Mar 2, 2026
adfd239
fix(site): P3 QA fixes — privacy policy, home SEO, ARIA dropdown sync
adrianwedd Mar 2, 2026
633d24f
fix(site): triple-QA audit — stats, accuracy, banned language, access…
adrianwedd Mar 2, 2026
a067eae
feat(blog): deploy NSW WHS AI compliance and safety lab independence …
adrianwedd Mar 2, 2026
f5fe99c
data: publish refreshed research visualizations artifact
adrianwedd Mar 6, 2026
c3b416a
blog: deploy 5 new posts from sprint-25 research
adrianwedd Mar 11, 2026
161f509
blog: world model attack surface taxonomy — 5 adversarial categories …
adrianwedd Mar 11, 2026
612e4ed
blog: 3 new threat horizon posts — actuator gap, alignment regression…
adrianwedd Mar 11, 2026
e4bd2c2
blog: 4 new posts — compliance paradox, System T/S, classifier qualit…
adrianwedd Mar 11, 2026
29e91da
build: rebuild site with 4 new blog posts (53 total)
adrianwedd Mar 11, 2026
b5a29ae
feat: add /new/ page — aggregated content feed sorted by date
adrianwedd Mar 11, 2026
2ce40e4
docs: update Donna Noble profile with sprint 25-26 accomplishments
adrianwedd Mar 11, 2026
04b7b6f
docs: update Clara Oswald profile with System T/S and inter-model agr…
adrianwedd Mar 11, 2026
5082417
docs: update Bill Potts profile with HANSE gap-fill and dataset expan…
adrianwedd Mar 11, 2026
f1136ff
docs: update Nyssa of Traken profile with wave 1-2 research accomplis…
adrianwedd Mar 11, 2026
faf9c12
docs: update Rose Tyler profile with wave 1-2 adversarial ops accompl…
adrianwedd Mar 11, 2026
b3487b9
docs: update Martha Jones profile with sprint 25-26 policy accomplish…
adrianwedd Mar 11, 2026
3edb267
docs: update Tegan Jovanka profile — wave 1-2 legal research accompli…
adrianwedd Mar 11, 2026
f89d8e7
docs: update River Song profile — wave 1-2 accomplishments and GLI ex…
adrianwedd Mar 11, 2026
fe43ff1
docs: update Donna Noble profile with wave 1-3 accomplishments
adrianwedd Mar 11, 2026
279b40a
docs: update Rose Tyler profile with wave 1-3 accomplishments
adrianwedd Mar 11, 2026
ce79f0d
docs: update Tegan Jovanka profile with wave 1-3 accomplishments
adrianwedd Mar 11, 2026
c5ef48c
docs: update Yasmin Khan profile with wave 1-3 accomplishments
adrianwedd Mar 11, 2026
773fe58
docs: update Martha Jones profile with wave 1-3 accomplishments
adrianwedd Mar 11, 2026
872ede9
docs: update Bill Potts profile with wave 1-3 accomplishments
adrianwedd Mar 11, 2026
b5c23a8
build: rebuild site with Clara Oswald profile updates
adrianwedd Mar 11, 2026
8967eb3
docs: update Amy Pond profile with sprint-26 accomplishments
adrianwedd Mar 11, 2026
2a2a712
fix: restore Martha Jones profile updates lost in stash conflict
adrianwedd Mar 11, 2026
d705262
docs: update River Song profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
fe8d010
docs: update Donna Noble profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
1897e45
docs: update Martha Jones profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
1aefe7c
docs: update Nyssa of Traken profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
3b40717
docs: update Tegan Jovanka profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
5664585
docs: update Clara Oswald profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
014fe68
docs: update Rose Tyler profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
2c3bef2
docs: update Bill Potts profile — wave 4 accomplishments
adrianwedd Mar 11, 2026
bca4bd2
blog: add reproducibility crisis in adversarial evaluation post
adrianwedd Mar 11, 2026
5af564f
docs: update Romana profile with wave 4 findings (n=10 duplication, r…
adrianwedd Mar 11, 2026
f54b768
blog: When Your Safety Grader Is Wrong — crescendo regrade story
adrianwedd Mar 11, 2026
ad45cd9
refactor: consolidate nav from 10 to 6 top-level items, update footer
adrianwedd Mar 11, 2026
0dfefb9
feat: daily papers March 9-12 (Tree of Attacks, Safety in Numbers, EI…
adrianwedd Mar 11, 2026
7ac88a6
blog: deploy "The Action Layer Has No Guardrails" — text-action safet…
adrianwedd Mar 11, 2026
0700347
blog: GLI governance gap analysis -- 5.5 years median lag, 90% null e…
adrianwedd Mar 11, 2026
e1f5761
blog: deploy "The Attack You Can't See" — SBA evaluation blindspot in…
adrianwedd Mar 11, 2026
ef95e08
blog: Three Vectors — embodied AI risk convergence analysis (post #59)
adrianwedd Mar 15, 2026
0806f7a
feat: update Adrian's profile photo on about/people page
adrianwedd Mar 15, 2026
cde4aa0
blog: deploy "Three Vectors" built pages (post #59)
adrianwedd Mar 15, 2026
e3258b5
content: add 7 daily papers (Feb 22-24) + 4 infographics
adrianwedd Mar 15, 2026
2d765cf
build: rebuild site index (pagefind, sitemap, RSS)
adrianwedd Mar 15, 2026
3881e03
build: rebuild site with updated profile photo
adrianwedd Mar 15, 2026
bcc1c2b
perf: optimize profile photo — 7.9MB PNG to 20KB WebP (400x400)
adrianwedd Mar 15, 2026
7fb7cee
blog: The Inverse Detectability-Danger Law — IDDL synthesis (post #60)
adrianwedd Mar 15, 2026
ef3b4dd
blog: Competence-Danger Coupling — the capability that makes robots u…
adrianwedd Mar 15, 2026
913c2ad
blog: deploy Embodied AI Threat Triangle post (#62)
adrianwedd Mar 15, 2026
67b98f6
blog: deploy The Unintentional Adversary post (#63)
adrianwedd Mar 15, 2026
bc680b0
blog: We Rebooted a Robot by Guessing 1234 — IMB attack class (post #63)
adrianwedd Mar 15, 2026
3764d41
blog: deploy "The U-Curve of AI Safety" (post #65)
adrianwedd Mar 15, 2026
56961ae
blog: deploy "The State of Embodied AI Safety, March 2026" (post #66)
adrianwedd Mar 15, 2026
3c6bb43
housekeeping: clean stale build assets from site rebuild
adrianwedd Mar 15, 2026
8106567
deploy: 8 audio + 7 video NLM assets, add video frontmatter to 11 papers
adrianwedd Mar 16, 2026
2c1d150
feat: add Bluesky domain verification for @failurefirst.org
adrianwedd Mar 16, 2026
8e14c5f
content: deploy NLM assets — 1 video (2307.14539) + 3 infographics (2…
adrianwedd Mar 17, 2026
06efea3
content: deploy NLM assets for Mar 16-18 papers (3 infographics + 3 a…
adrianwedd Mar 17, 2026
c9cb2eb
content: publish 6 daily papers (Mar 13-18) with audio/image assets
adrianwedd Mar 17, 2026
b287346
deploy: rebuild site with latest content (6 daily papers, media assets)
adrianwedd Mar 18, 2026
b3a57d8
fix: include CSS/JS assets and pagefind index from Astro rebuild
adrianwedd Mar 18, 2026
5f89fbf
blog: Haidilao robot incident analysis — when crazy dance met reality…
adrianwedd Mar 18, 2026
019666b
blog: 10 incident analysis posts — embodied AI safety incident database
adrianwedd Mar 18, 2026
0f82476
blog: 5 new incident posts + references on all 10 + 7 embedded videos
adrianwedd Mar 18, 2026
99b00de
feat: embed video player in blog posts (was download-only badge)
adrianwedd Mar 18, 2026
099b267
feat: enhanced analytics — video completion, downloads, 404s, social …
adrianwedd Mar 18, 2026
db76dcc
fix: remove redundant video download badge — player is inline
adrianwedd Mar 18, 2026
de2156f
perf: convert all images to WebP — 413MB -> 49MB (88% reduction)
adrianwedd Mar 18, 2026
2a22ffd
blog: 3 new posts — EU AI Act countdown, defense impossibility theore…
adrianwedd Mar 18, 2026
d4f7885
Create CNAME
adrianwedd Mar 18, 2026
42d0c06
blog: wave 3 deploy — defense impossibility, iatrogenesis, EU AI Act …
adrianwedd Mar 18, 2026
376e5ff
fix: add WebP profile pics for team page (web_*.jpg -> web_*.webp)
adrianwedd Mar 18, 2026
2d449d0
content: daily paper #60 — Safety is Non-Compositional (arXiv:2603.15…
adrianwedd Mar 19, 2026
6b03956
content: NLM assets for daily paper #60 (2603.15973) — infographic + …
adrianwedd Mar 19, 2026
68ce061
fix: remove 20 broken audio links, stale visualization artifact, brok…
adrianwedd Mar 19, 2026
8b856dc
blog: 5 new posts from Wave 4-6 findings (polypharmacy, field manual,…
adrianwedd Mar 19, 2026
15be45b
blog: add 2 posts -- non-compositional safety proof + incident databa…
adrianwedd Mar 19, 2026
b5d24f6
blog: 3 ethics-focused posts — independence, alignment faking, dual-u…
adrianwedd Mar 19, 2026
49f8c33
blog: 2 posts from Wave 9 findings -- context collapse + safety train…
adrianwedd Mar 19, 2026
7d85f54
deploy: Q1 2026 State of Embodied AI Safety post
adrianwedd Mar 22, 2026
3fd8898
nav: rename "Policy & Services" to "Services", link to /services/
adrianwedd Mar 22, 2026
c805bf9
daily-paper: 3 papers for Mar 23 — PreSafe, Agentic Pressure, VROP
adrianwedd Mar 22, 2026
9370898
deploy: nav restructure (Services), 4 daily papers, Q1 summary post
adrianwedd Mar 22, 2026
55710d1
design: 6 visual upgrades from adrianwedd.com + afterwords
adrianwedd Mar 22, 2026
de7f799
fix: scope card paragraph margin to last-child only
adrianwedd Mar 22, 2026
0065481
fix: resolve QA findings — stubs, private links, stale counts, language
adrianwedd Mar 22, 2026
783532e
deploy: design upgrades + QA fixes (scroll-reveal, serif headings, ca…
adrianwedd Mar 22, 2026
325b09d
daily-paper: re-date to one paper per day (fill Mar 20, 21 gaps)
adrianwedd Mar 22, 2026
0ca33fe
data: enrich company directory — stage, founded, safetyNotes for top 51
adrianwedd Mar 22, 2026
984f1fb
data: enrich AI safety orgs — country, founded, keyPrograms for all t…
adrianwedd Mar 22, 2026
93c04d9
deploy: enriched directories (51 companies, 117 orgs), daily paper da…
adrianwedd Mar 22, 2026
2e95505
deploy: add 3 blog posts (DETECTED_PROCEEDS, capability-safety decoup…
adrianwedd Mar 22, 2026
f5aaa7e
site: deploy 2 blog posts -- defense effectiveness + autonomous attac…
adrianwedd Mar 23, 2026
c6478f5
blog: Five Predictions for AI Safety in Q2 2026
adrianwedd Mar 23, 2026
b7bc0a2
site: refresh all stats to current canonical values (190 models, 141K…
adrianwedd Mar 23, 2026
a3ae201
blog: We're Publishing Our Iatrogenesis Research -- Here's Why
adrianwedd Mar 23, 2026
6496980
deploy: add 'We Were Wrong: AI Safety Defenses Do Work' blog post
adrianwedd Mar 23, 2026
57cadc4
blog: First Look Inside AI Safety Mechanisms -- refusal geometry find…
adrianwedd Mar 23, 2026
26c89e6
feat: add LegalMemoLayout with mandatory disclaimer banner
adrianwedd Mar 23, 2026
e9545bc
feat: add reports, legal, policy-docs content collections with auto-g…
adrianwedd Mar 23, 2026
1b42b94
nav: add Research Reports and Legal Analysis to navigation
adrianwedd Mar 23, 2026
37a3113
content: publish 15 reports (#169-183), 6 legal memos (LR-48-53), 4 p…
adrianwedd Mar 23, 2026
c90c108
deploy: publish 15 reports, 6 legal memos, 4 policy docs + new collec…
adrianwedd Mar 23, 2026
5f3656a
fix: rebuild site with correct asset hashes
adrianwedd Mar 23, 2026
770578a
people: add Leela, K-9, Sarah Jane Smith profiles + update index
adrianwedd Mar 23, 2026
e16ed43
design: full-viewport hero sections with generative canvas backgrounds
adrianwedd Mar 23, 2026
935bb0f
deploy: full-viewport hero sections with generative canvas (neural, f…
adrianwedd Mar 23, 2026
3dfceb6
deploy: 3 new people pages (Leela, K-9, Sarah Jane Smith)
adrianwedd Mar 23, 2026
520d33b
site: full-viewport animated heroes + canvas backgrounds across all 6…
adrianwedd Mar 23, 2026
9470d2a
site: update all 14 agent profile pages with current priorities + voi…
adrianwedd Mar 23, 2026
0a6f943
deploy: rebuild failurefirst.org (2 files, 633 pages)
adrianwedd Mar 23, 2026
4b29e09
deploy: rebuild failurefirst.org with Wave 1-3 content
adrianwedd Mar 23, 2026
740a57c
site: Wave 1-3 content deploy — 5 new blog posts, March roundup, 638 …
adrianwedd Mar 24, 2026
86aff90
deploy: rebuild failurefirst.org (2 files, 638 pages)
adrianwedd Mar 24, 2026
7de802c
site: Wave 1-5 final deploy — papers page, 640 pages, format-lock par…
adrianwedd Mar 24, 2026
9e470a9
blog: 5 new posts — detected proceeds, polyhedral safety, EU complian…
adrianwedd Mar 24, 2026
b7d3d50
site: rewrite all 14 agent profiles — punchy, consistent, externally-…
adrianwedd Mar 24, 2026
f223fda
deploy: +6 blog posts (AdvBench results, Qwen3 safety, detected-proce…
adrianwedd Mar 24, 2026
27db3ca
site: source files for Wave 1-8 — HeroSection canvas animations, agen…
adrianwedd Mar 24, 2026
5c42e4e
fix: resolve #566 — people page cards not clickable due to z-index st…
adrianwedd Mar 24, 2026
be0ed8f
blog: add reasoning-level DETECTED_PROCEEDS and defense evolver posts
adrianwedd Mar 24, 2026
eaeac3d
build: rebuild site — 653 pages, z-index fix + 2 new blog posts
adrianwedd Mar 24, 2026
4f738f7
site: add service offering page + 2 blog posts (656 pages)
adrianwedd Mar 24, 2026
3178032
blog: Free AI Safety Score announcement post
adrianwedd Mar 24, 2026
aec11dc
site: rebuild with Free AI Safety Score blog post (657 pages)
adrianwedd Mar 24, 2026
23c7ffe
blog: deploy annual report, framework integrations, threat horizon v3
adrianwedd Mar 24, 2026
b5f37f2
site: rebuild with 3 new posts (660 pages)
adrianwedd Mar 24, 2026
89beb77
blog: Compliance Cascade — a new class of AI jailbreak
adrianwedd Mar 24, 2026
9f876fd
blog: The Epistemic Crisis — can we trust AI safety benchmarks?
adrianwedd Mar 24, 2026
bb838ca
blog: First results from Ollama Cloud testing
adrianwedd Mar 24, 2026
2281d4e
build: site rebuild with 3 new blog posts (663 pages)
adrianwedd Mar 24, 2026
e235e66
blog: CARTO — The First AI Red Team Certification announcement
adrianwedd Mar 24, 2026
b561bfc
chore(site): rebuild docs/ — 664 pages, CARTO blog post live
adrianwedd Mar 24, 2026
b4177b4
blog: frontier model safety — 1.1T parameters does not mean safe
adrianwedd Mar 24, 2026
b00635e
blog: reasoning-level DETECTED_PROCEEDS confirmed across 3 providers
adrianwedd Mar 24, 2026
38b1303
blog: research papers landing page with 3 pre-arXiv abstracts
adrianwedd Mar 24, 2026
cf2a6bb
site: rebuild with 3 new blog posts (667 pages)
adrianwedd Mar 24, 2026
465f7f0
blog: Format-Lock — The Universal AI Jailbreak
adrianwedd Mar 24, 2026
239290d
blog: CARTO Beta — First 10 Testers Wanted
adrianwedd Mar 24, 2026
3f137e7
blog: Update services page with frontier data and Model Safety Scorecard
adrianwedd Mar 24, 2026
0395038
build: site rebuild — 669 pages (+3 new blog posts)
adrianwedd Mar 24, 2026
714ffc9
site: add autoplay voice intros to all 14 companion profile pages
adrianwedd Mar 25, 2026
88d0bac
site: compress voice intros from WAV to OGG Opus (31MB -> 8.3MB)
adrianwedd Mar 25, 2026
e7c8ce7
content: add 4 blog posts — threat horizon, regulatory gap, iatrogene…
adrianwedd Mar 25, 2026
cf527b2
blog: add Threat Horizon Digest #1 and service tier announcement
adrianwedd Mar 25, 2026
72f0cb8
site: add 3-tier assessment pricing to services page
adrianwedd Mar 25, 2026
4d05c5d
site: rebuild docs/ with 2 new blog posts — threat horizon digest + s…
adrianwedd Mar 25, 2026
2708fb6
blog: pre-draft 3 arXiv preprint announcement posts (draft=true)
adrianwedd Mar 25, 2026
4489ef4
blog: deploy 3 new posts -- emotional AI ethics, safety awareness par…
adrianwedd Mar 25, 2026
429e068
blog: deploy State of AI Safety Q1 2026 -- flagship quarterly assessment
adrianwedd Mar 25, 2026
aa432d1
site: update Adrian profile + re-clone K-9, Amy, Rose voice intros
adrianwedd Mar 25, 2026
e0ecce1
images: add/replace 5 agent profile images (Leela, Sarah Jane, K-9, N…
adrianwedd Mar 25, 2026
fe1635d
feat: snap-scroll team page with neural canvas and audio intros
adrianwedd Mar 25, 2026
07e735c
fix: resolve 10 QA-identified bugs in team page implementation
adrianwedd Mar 25, 2026
9d90fe4
team: K-9 as closer with CTA to /services, reorder agents
adrianwedd Mar 25, 2026
cb9c3a4
team: first-name-only bios + 8 rewrites (River Song quality bar)
adrianwedd Mar 25, 2026
e18e710
audio: final synthesis — all 14 first-name-only voice intros + OGG + MP3
adrianwedd Mar 25, 2026
67b2b1e
deploy: final build with first-name audio + team page
adrianwedd Mar 25, 2026
840fc6e
deploy: fresh build — verified neural canvas + audio in bundles
adrianwedd Mar 25, 2026
89bae45
audio: update 12 agent voice intros (OGG + MP3)
adrianwedd Mar 26, 2026
95a5061
deploy: update 12 agent voice intros (first-name resynth)
adrianwedd Mar 26, 2026
c40413c
deploy: team page refresh, voice regen, NLM backfill
adrianwedd Mar 26, 2026
232aab6
deploy: standalone CTA card, Rose voice (in-character re-clone)
adrianwedd Mar 26, 2026
7019880
deploy: rebrand to Failure-First, add 23 foundational daily papers, f…
adrianwedd Mar 26, 2026
16b75aa
deploy: complete rebrand — replace all F41LUR3-F1R57 with Failure-First
adrianwedd Mar 26, 2026
c0817ae
deploy: +6 daily papers, compliance paradox, iatrogenic safety
adrianwedd Mar 27, 2026
15aa543
deploy: +4 blog posts (defenses correction, DETECTED_PROCEEDS, capabi…
adrianwedd Mar 27, 2026
a0b3d4a
deploy: new OG image (Forensic Lab design)
adrianwedd Mar 27, 2026
9a1d192
deploy: G0DM0D3 daily paper with NLM audio + infographic
adrianwedd Mar 27, 2026
bb1aebe
deploy: SPM temperature dial blog post
adrianwedd Mar 27, 2026
1ff3d6c
deploy: defense bypass + L1B3RT45 corpus blog posts with NLM assets
adrianwedd Mar 28, 2026
208fa31
fix: rename daily paper files to URL-safe slugs (no spaces/colons)
adrianwedd Mar 28, 2026
6b81408
fix: remove duplicate image: lines in frontmatter
adrianwedd Mar 28, 2026
4492bb9
fix: pass per-post OG image to BaseLayout in BlogPostLayout
adrianwedd Mar 28, 2026
aaf6b4f
deploy: +5 NLM infographic OG images (daily papers)
adrianwedd Mar 28, 2026
944fc18
deploy: daily paper infographic updates
adrianwedd Mar 28, 2026
3b7e8cc
deploy: 67% wall + positional bias posts
adrianwedd Mar 28, 2026
c128472
blog: Sprint 16 threat synthesis -- five findings from March 2026
adrianwedd Mar 28, 2026
26b5a19
deploy: Sprint 16 threat synthesis post + full site rebuild (720 pages)
adrianwedd Mar 28, 2026
7bb838e
deploy: add 39 daily papers, fix correction notice a11y, schema flexi…
adrianwedd Mar 29, 2026
5c6994b
fix: remove 7 daily-paper duplicates from blog collection, fix future…
adrianwedd Mar 29, 2026
310f26a
fix: add redirects for 7 blog→daily-paper moves, add brand specs
adrianwedd Mar 29, 2026
16495ee
chore: remove .DS_Store and .superpowers artifacts, add to .gitignore
adrianwedd Mar 29, 2026
2f6f1cb
chore: remove .vscode from tracking, add to .gitignore
adrianwedd Mar 29, 2026
9072dec
deploy: 14 new NLM infographics, fix all broken image refs (750 pages)
adrianwedd Mar 29, 2026
49e7b07
deploy: add image frontmatter for 6 newly infographic'd daily papers …
adrianwedd Mar 29, 2026
c726769
content: remove 6 duplicate blog posts + 16 misplaced daily-paper ent…
adrianwedd Mar 29, 2026
62e55d7
deploy: rebuild after blog dedup + daily-paper cleanup (728 pages)
adrianwedd Mar 29, 2026
d41033e
fix: restore 7 paper reviews to /blog/ (original shared URLs), remove…
adrianwedd Mar 29, 2026
460b5c9
deploy: 13 new daily papers from DeepInception citation network (740 …
adrianwedd Mar 29, 2026
42de0ac
feat: cross-link blog and daily-paper collections
adrianwedd Mar 29, 2026
aa56ea4
fix: move 7 paper reviews to /daily-paper/ with OG-preserving redirec…
adrianwedd Mar 29, 2026
cb8e04c
brand: new fractured hexagon logo + OG image
adrianwedd Mar 29, 2026
d69e4b6
docs: update 5 stale public-facing files to reflect current project s…
adrianwedd Mar 29, 2026
7f948bc
feat: add content type filters to /new/ page + include reports/policy…
adrianwedd Mar 29, 2026
9380cc1
refresh: /about/ page with current metrics, key findings, stat grid
adrianwedd Mar 29, 2026
23ad674
disclosure: add "How this team works" section to /about/team/
adrianwedd Mar 29, 2026
bdea5a6
site: add Google AdSense (ca-pub-6275306310835906) to BaseLayout head
adrianwedd Mar 30, 2026
7088297
site: add ads.txt for Google AdSense verification
adrianwedd Mar 30, 2026
311ab58
deploy: AdSense, ads.txt, visual jailbreaks blog post draft
adrianwedd Mar 30, 2026
4362304
deploy: eight-layers-of-visual-jailbreaks blog post
adrianwedd Mar 30, 2026
dd68c6a
fix: remove daily-paper media from docs/ to fix GitHub Pages build
adrianwedd Mar 30, 2026
c0fb702
fix: remove 15 duplicate daily-paper posts (same arxiv ID, different …
adrianwedd Mar 30, 2026
f367d38
daily-paper: +16 posts filling Mar 20-22, 24-26, 30-31 gaps
adrianwedd Mar 30, 2026
a8b9fab
deploy: +16 daily papers, dedup cleanup, coverage Mar 20-31
adrianwedd Mar 30, 2026
538b64b
daily-paper: +2 papers for 2026-03-30 [automated]
claude Mar 30, 2026
585ce7f
fix: migrate daily-paper media paths to cdn.failurefirst.org (R2)
adrianwedd Mar 31, 2026
6bc026f
Merge branch 'main' of https://github.com/adrianwedd/failure-first
adrianwedd Mar 31, 2026
71f6d3d
deploy: CDN media paths, automated daily papers
adrianwedd Mar 31, 2026
74a4931
site: publish all research papers with PDF downloads
adrianwedd Mar 31, 2026
18300a4
deploy: papers + CDN media cleanup (ENOSPC fix)
adrianwedd Mar 31, 2026
2419eec
site: add Cloudflare Web Analytics + Sentry placeholders to BaseLayout
adrianwedd Mar 31, 2026
9679ad0
site: add Sentry error tracking (@sentry/astro) + CF Web Analytics pl…
adrianwedd Mar 31, 2026
ffef991
build(deps): bump h3 from 1.15.5 to 1.15.10 in /site
dependabot[bot] Mar 31, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Binary file removed .DS_Store
Binary file not shown.
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Afterwords TTS voice override
.afterwords

# OS files
.DS_Store

# IDE
.vscode/

# Superpowers brainstorm artifacts
.superpowers/
91 changes: 49 additions & 42 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,69 @@
# Contributing to Failure-First Embodied AI
# Contributing to Failure-First

Thank you for your interest in contributing to Failure-First Embodied AI!
Thank you for your interest in Failure-First. This is a **research project**, not a typical open-source codebase. Contributions are welcome, but the ways to contribute differ from a standard software project.

## Important: Public Repository Context
## How to Contribute

This is the **public-facing** repository for the Failure-First research project. Contributions must adhere to strict safety guidelines to ensure all content remains:
- Pattern-level only (never operational)
- Defensively purposed
- Appropriate for public academic discourse
### Report Issues

## What to Contribute
If you find errors in our published findings, methodology gaps, broken links on [failurefirst.org](https://failurefirst.org), or inconsistencies in the public documentation, please open a GitHub issue.

**✅ Welcome Contributions:**
- Documentation improvements
- Research methodology clarifications
- Failure taxonomy additions (pattern-level)
- Website improvements
- Typo fixes and clarity improvements
### Cite Our Work

**❌ Not Accepted:**
- Operational exploit code
- Working jailbreak prompts
- Model-specific bypass techniques
- Raw test results or adversarial datasets
The most impactful contribution for a research project is citation. If our findings, datasets, or methodology inform your work, please cite us:

## Contribution Process
```bibtex
@software{failure_first_2026,
title = {Failure-First: Adversarial Evaluation Framework for Embodied AI},
author = {Wedd, Adrian},
year = {2026},
url = {https://failurefirst.org},
note = {227 models, 141{,}561 prompts, 337 attack techniques}
}
```

1. **Fork** the repository
2. **Create a branch** for your changes
3. **Make your changes** following our guidelines
4. **Submit a pull request** with a clear description
### Red-Team Collaboration

## Safety Review
We welcome collaboration with AI safety researchers, red-team practitioners, and frontier lab security teams. If you have adversarial evaluation results, novel attack technique taxonomies, or defense effectiveness data you would like to contribute or cross-validate, open a GitHub issue describing your institutional affiliation and research focus.

### Dataset Contributions

If you have adversarial evaluation datasets that could strengthen the corpus, we accept contributions subject to:

- **Pattern-level only**: no operational exploits or copy-paste attack templates
- **Provenance documented**: source, collection methodology, and intended use
- **Schema compliance**: data must conform to our versioned JSON Schemas (documented in the private repository; we will assist with formatting)
- **Safety review**: all contributed data undergoes review before inclusion

### Documentation Improvements

Corrections, clarifications, and improvements to public-facing documentation (this repository, the design charter, security policy) are welcome via pull request.

All contributions undergo safety review to ensure:
- No operational exploit instructions
- Pattern-level descriptions only
- Appropriate for public repository
- Aligned with defensive research mission
## What We Do Not Accept

## Code of Conduct
- Operational exploit code or working jailbreak prompts
- Model-specific bypass techniques intended for attack
- Raw adversarial datasets without provenance
- Content that facilitates real-world harm outside AI safety research

- Be respectful and professional
- Focus on defensive AI safety research
- No weaponization of research findings
- Maintain academic integrity
## Vulnerability Reporting

## Questions?
If you discover vulnerabilities in AI systems -- whether through this framework or independent research -- please follow responsible disclosure practices. See [SECURITY.md](SECURITY.md) for our coordinated disclosure process.

- **Issues**: Open a GitHub issue for questions or suggestions
- **Discussions**: Use GitHub Discussions for research-related conversations
## Process

1. Open a GitHub issue describing the proposed contribution
2. For documentation changes, submit a pull request directly
3. For research collaborations and dataset contributions, we will coordinate via issue discussion

## Safety Review

All contributions undergo safety review to ensure content remains pattern-level, defensively purposed, and appropriate for a public repository. This review is not optional and applies equally to maintainers and external contributors.

## License

By contributing, you agree that your contributions will be licensed under the MIT License, the same license as this project.
By contributing, you agree that your contributions will be licensed under the MIT License.

---

**Remember:** This is defensive AI safety research. All contributions should strengthen defenses, not enable attacks.

**Last updated:** 2026-02-01
**Last updated:** 2026-03-29
40 changes: 29 additions & 11 deletions DESIGN_CHARTER.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,14 @@ This is a **research methodology for studying AI safety through systematic failu

At its center is a principle: **failure is signal, not noise**.

The framework exists to support *rigorous failure analysis, defensive research, and safety boundary mapping*.
The framework exists to support *rigorous failure analysis, defensive research, and safety boundary mapping* across the full landscape of adversarial AI evaluation:

- **Jailbreak archaeology**: systematic study of how adversarial techniques evolve across eras, from early DAN-style prompts through crescendo attacks, format-lock exploitation, and reasoning-chain manipulation
- **VLA safety evaluation**: 42 attack families targeting vision-language-action models, covering affordance manipulation, kinematic injection, safety instruction dilution, and dual-layer attacks
- **Multi-turn escalation**: stateful attack sequences that exploit context accumulation in reasoning and tool-using models
- **Format-lock attacks**: structured output compliance (JSON, YAML, code) as an attack vector against safety training
- **Defense effectiveness measurement**: quantifying how safety mechanisms perform under adversarial pressure, including positional bias, iatrogenic effects, and cross-model vulnerability inheritance
- **Classifier reliability**: documenting the gap between heuristic and LLM-graded attack success rates

This is not a demonstration.
It is not an attack platform.
Expand Down Expand Up @@ -59,7 +66,7 @@ The tooling must not pretend otherwise.
These are not preferences.
They are constraints.

If an implementation violates any of the following, it is wrongeven if it "works".
If an implementation violates any of the following, it is wrong -- even if it "works".

### 3.1 Pattern-Level Only, Never Operational

Expand Down Expand Up @@ -109,6 +116,14 @@ If an implementation violates any of the following, it is wrong—even if it "wo
- No security through obscurity
- If the process cannot withstand scrutiny, it should not be published

### 3.7 Statistical Rigor Over Convenience

- All ASR claims must specify grading methodology (LLM-only, heuristic-only, or coalesced)
- Confidence intervals are required for sample-based estimates
- Multiple-comparison corrections are mandatory for pairwise model comparisons
- Classifier reliability must be documented with inter-rater agreement metrics
- If a finding cannot survive peer review scrutiny, it should not be published

---

## 4. Architectural Decisions (Final)
Expand All @@ -121,8 +136,10 @@ They are no longer exploratory.
- **Validation**: JSON Schema + custom invariant checks
- **Safety gates**: Heuristic linters + manual review
- **Benchmark runners**: Model-agnostic (CLI templates + HTTP API adapters)
- **Scoring**: Refusal quality, recovery mechanisms, attack success rates
- **Scoring**: Three-tier ASR (strict, broad, functionally dangerous) with Wilson CIs
- **Classification**: LLM-graded (FLIP protocol) with documented grader reliability
- **Version control**: Git, with CI enforcement of validation/linting
- **Database**: SQLite corpus unifying prompts, techniques, and evaluation results

If a future change requires revisiting these decisions, the charter must be updated explicitly.

Expand Down Expand Up @@ -152,7 +169,7 @@ If adversarial content can be weaponized directly, it has violated this charter.

This document defines intent and limits.

Implementation detailsincluding tool design, algorithms, dataset schema extensions, benchmark metrics, and acceptance criteriaare defined in issues, documentation, and code.
Implementation details -- including tool design, algorithms, dataset schema extensions, benchmark metrics, and acceptance criteria -- are defined in issues, documentation, and code.

The relationship is explicit:

Expand Down Expand Up @@ -210,7 +227,7 @@ When using AI systems to generate adversarial scenarios:

## 9. Research Ethics Boundaries

This project operates within established AI safety research norms:
This project operates within established AI safety research norms. A full research ethics charter is maintained in the private repository.

### Acceptable Research Activities

Expand All @@ -219,6 +236,7 @@ This project operates within established AI safety research norms:
- Testing robustness of safety mechanisms
- Improving alignment under adversarial pressure
- Publishing defensive research findings
- Coordinated vulnerability disclosure to model providers

### Unacceptable Activities

Expand All @@ -230,10 +248,10 @@ This project operates within established AI safety research norms:

### Disclosure Standards

- Vulnerabilities discovered through this research should be disclosed responsibly
- Real-world safety issues should be reported to affected parties before public disclosure
- Research findings should distinguish between controlled evaluation and real-world risk
- Limitations of evaluation harnesses must be stated explicitly
- Vulnerabilities discovered through this research are disclosed responsibly
- Real-world safety issues are reported to affected parties before public disclosure
- Research findings distinguish between controlled evaluation and real-world risk
- Limitations of evaluation harnesses are stated explicitly

---

Expand All @@ -249,8 +267,8 @@ This charter may evolve as the project grows, but changes must be:
Minor clarifications (typo fixes, example additions) do not require versioning.
Substantive changes (adding/removing principles, changing constraints) require charter version increment.

**Current version**: 1.0
**Last updated**: 2025-01-11
**Current version**: 2.0
**Last updated**: 2026-03-29

---

Expand Down
13 changes: 8 additions & 5 deletions MANIFEST.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,17 @@
"note": "Full traces available under NDA. Contact via GitHub issue.",
"generated_from": "failure-first-embodied-ai (private)",
"totals": {
"files": 632,
"files": 860,
"invariant_errors": 0,
"json_parse_errors": 0,
"rows": 51201,
"rows": 60847,
"schema_errors": 0,
"failure_classes": 661,
"domains": 19,
"models_evaluated": 51
"prompts": 141561,
"results": 133646,
"techniques": 337,
"harm_classes": 124,
"domains": 28,
"models_evaluated": 227
},
"packs_by_kind": {
"adversarial_poetry": 3,
Expand Down
Loading