Abhishek Upadhayay kyahikaru

💭

Building

Mornings red-teaming LLMs, evenings analyzing the failures, nights making them more robust.

Pinned Loading

cai-vs-rlhf-social-modulation cai-vs-rlhf-social-modulation Public

Comparative evaluation of Constitutional AI vs RLHF guardrail robustness under sustained socially conditioned multi turn red teaming. Documents social modulation mechanism..

1
hinglish-prompt-injection-detector hinglish-prompt-injection-detector Public

Lightweight hybrid defense against stealth prompt injection in Hinglish. Combines a 22‑rule Contextual Guard with an L12 SVM classifier. 98.4% detection on stealth benchmarks, 0.6% FPR, CPU‑only.

Jupyter Notebook 1