Extending the original PromptInjector with dynamic rules, ML scoring, semantic detection, and explainability. This project builds on top of existing tool: PromptInjector (https://github.com/nayangoel/PromptInjector) repository, which focuses on generating adversarial prompts for red teaming large language models. The original tool is offensive oriented: it produces jailbreak attempts, role play bypasses, and other adversarial inputs to test LLM robustness. This enhanced version adds a full defensive layer: a modular, explainable promptinjection classifier that detects, scores, and explains suspicious user inputs using a hybrid of rules, machine learning, and semantic similarity. Project created with AI assistance.
tschecurity/ML-Plus-JSON-Classifier-Integrated-With-PromptInjector
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|