- Canada
- bigsnarfdude.github.io
Pinned Loading
-
iatrogenic_effect
iatrogenic_effect PublicMech Interp Experiments: iantrogenic effects Llama-3.1-8B/70B base vs instruct
Python 1
-
attentional_hijacking
attentional_hijacking PublicThis repo contains the six core experiments that demonstrate and characterize the mechanism of attentional hijacking, across Gemma 3 4B, 12B, and 27B (IT and PT variants).
Python
-
researchRalph
researchRalph PublicAutonomous research using multi-agent swarm for experiments
Python 1
-
ICML_experiments
ICML_experiments PublicSalience-weighted attentional hijacking: ablation experiments for ICML MechInterp Workshop
HTML
-
softmaxExperiments
softmaxExperiments PublicMechanistic interpretability: Truth Jailbreak attentional hijacking experiments on transformers
Python
-
mindreader
mindreader PublicFine-tuned classifiers for chain-of-thought deception detection - training code and weights
Python
If the problem persists, check the GitHub status page or contact support.





