Add Resemble AI Detect guardrail plugin#1610
Add Resemble AI Detect guardrail plugin#1610devshahofficial wants to merge 1 commit intoPortkey-AI:mainfrom
Conversation
Adds a new guardrail plugin that scans audio, video, and image URLs referenced in LLM requests for deepfake / synthetic content via the Resemble AI Detect API (https://app.resemble.ai/api/v2/detect). The handler supports: - beforeRequestHook and afterRequestHook - URL extraction from multimodal content parts (OpenAI input_audio / image_url, Anthropic source.url), regex over plain text, or context.metadata fallback (selectable via urlSource) - Optional audio source tracing (identifies TTS vendor), reverse image search, and Zero Retention Mode - Configurable threshold, polling interval / timeout, and fail-open vs fail-closed behaviour on API errors 23 jest tests cover URL extraction, evaluation, the handler happy path, polling, timeouts, fail modes, and base-URL override.
|
Audio deepfake detection is an important guardrail layer as voice-enabled agents become more common — a voice agent that cannot verify whether the audio it received is authentic is vulnerable to real-time voice cloning attacks. One dimension Resemble AI Detect covers at the content layer that complements an agent identity layer: the detected deepfake tells you the audio is synthetic, but not which agent sent it. For multi-agent voice pipelines (e.g., one agent calling another via voice channel), a deepfake detection positive identifies the attack but not the attacker — attribution requires agent identity at the call layer alongside content-level detection. A combined trust check for voice-enabled agent endpoints: const result = await resembleDetect.scan(audioBuffer);
if (result.deepfake_score > THRESHOLD) {
emit_security_event("audio_deepfake_detected", {
deepfake_score: result.deepfake_score,
caller_agent_id: request.headers["X-Agent-ID"] || "unknown",
caller_trust_score: await satp.getScore(caller_agent_id) // was this caller already suspect?
});
return deny();
}The Happy to contribute an implementation that includes caller identity context alongside the deepfake score. |
Summary
Adds a new guardrail plugin that scans audio, video, and image URLs referenced in LLM requests for deepfake / synthetic content via the Resemble AI Detect API.
The plugin runs in
beforeRequestHookandafterRequestHook. It extracts a media URL from the request, submits it toPOST /detect, pollsGET /detect/{uuid}until completion, and returnsverdict: falsewhen Resemble labels the media as `fake` or the aggregated score exceeds the configured threshold.Why
Deepfakes are increasingly being used to manipulate AI workflows — voice-cloned audio pasted into a chat-completion prompt, synthetic images submitted to a multimodal model, video content slipped through a RAG pipeline. Portkey users asking "is this input real?" currently have no option in the plugins catalog. Resemble Detect covers audio, video, and image from a single endpoint, and ships with useful side signals like audio source tracing (identifies which TTS vendor — ElevenLabs, Resemble, PlayHT, OpenAI — generated flagged audio) and reverse image search for images.
What this PR adds
Configuration
URL extraction
The handler searches for a media URL in three places, in order:
Fail modes
By default the plugin fails open — if the Resemble API errors out (auth, network, timeout), the request still passes through and the error is recorded in `data`. Set `failClosed: true` to block on API errors. This matches the pattern in other guardrail plugins.
Test plan
```
Test Suites: 1 passed, 1 total
Tests: 23 passed, 23 total
```
Notes