Building
Mornings red-teaming LLMs, evenings analyzing the failures, nights making them more robust.
- Mohali, India
- in/abhishekupadhayay
Pinned Loading
-
cai-vs-rlhf-social-modulation
cai-vs-rlhf-social-modulation PublicComparative evaluation of Constitutional AI vs RLHF guardrail robustness under sustained socially conditioned multi turn red teaming. Documents social modulation mechanism..
-
hinglish-prompt-injection-detector
hinglish-prompt-injection-detector PublicLightweight hybrid defense against stealth prompt injection in Hinglish. Combines a 22‑rule Contextual Guard with an L12 SVM classifier. 98.4% detection on stealth benchmarks, 0.6% FPR, CPU‑only.
Jupyter Notebook 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.