Skip to content

Add Resemble AI Detect guardrail plugin#1610

Open
devshahofficial wants to merge 1 commit intoPortkey-AI:mainfrom
devshahofficial:resemble-detect-plugin
Open

Add Resemble AI Detect guardrail plugin#1610
devshahofficial wants to merge 1 commit intoPortkey-AI:mainfrom
devshahofficial:resemble-detect-plugin

Conversation

@devshahofficial
Copy link
Copy Markdown

Summary

Adds a new guardrail plugin that scans audio, video, and image URLs referenced in LLM requests for deepfake / synthetic content via the Resemble AI Detect API.

The plugin runs in beforeRequestHook and afterRequestHook. It extracts a media URL from the request, submits it to POST /detect, polls GET /detect/{uuid} until completion, and returns verdict: false when Resemble labels the media as `fake` or the aggregated score exceeds the configured threshold.

Why

Deepfakes are increasingly being used to manipulate AI workflows — voice-cloned audio pasted into a chat-completion prompt, synthetic images submitted to a multimodal model, video content slipped through a RAG pipeline. Portkey users asking "is this input real?" currently have no option in the plugins catalog. Resemble Detect covers audio, video, and image from a single endpoint, and ships with useful side signals like audio source tracing (identifies which TTS vendor — ElevenLabs, Resemble, PlayHT, OpenAI — generated flagged audio) and reverse image search for images.

What this PR adds

  • `plugins/resemble/manifest.json` — plugin declaration with credentials schema, parameters, and supported hooks
  • `plugins/resemble/detect.ts` — handler, URL extraction, polling, and verdict logic
  • `plugins/resemble/detect.test.ts` — 23 jest tests
  • Registers `resemble: { detect: ... }` in `plugins/index.ts`

Configuration

Param Default Description
`credentials.apiKey` — (required) Resemble API token (encrypted)
`credentials.apiBase` `https://app.resemble.ai/api/v2\` Override for self-hosted / staging
`threshold` 0.5 Aggregated score above which media is treated as fake
`mediaType` auto Force `audio` / `video` / `image`, or let Resemble auto-detect
`audioSourceTracing` false Identify TTS vendor for flagged audio
`useReverseSearch` false Run reverse image search for images
`zeroRetentionMode` false Delete media after detection completes
`urlSource` `auto` Where to look for the URL: `auto`, `metadata`, or `content`
`metadataKey` `mediaUrl` Metadata key when `urlSource` is `metadata` or `auto`
`pollIntervalMs` 2000 How often to poll Resemble
`pollTimeoutMs` 60000 Max wait before failing open
`failClosed` false If `true`, API errors block the request

URL extraction

The handler searches for a media URL in three places, in order:

  1. Multimodal content parts — OpenAI-style `input_audio`, `image_url`, and Anthropic-style `source.url` (image / document)
  2. Regex over joined text — matches https URLs ending in common audio/video/image extensions, inside message content, `prompt`, or `input`
  3. `context.metadata[metadataKey]` — fallback for the `auto` mode, primary source when `urlSource: 'metadata'`

Fail modes

By default the plugin fails open — if the Resemble API errors out (auth, network, timeout), the request still passes through and the error is recorded in `data`. Set `failClosed: true` to block on API errors. This matches the pattern in other guardrail plugins.

Test plan

  • 23 jest tests pass (`npm run test:plugins -- plugins/resemble`)
  • Prettier check passes
  • Existing plugins tests unaffected
  • Manifest conforms to the plugin schema (loads via plugin manager)

```
Test Suites: 1 passed, 1 total
Tests: 23 passed, 23 total
```

Notes

  • This is the first new guardrail under the `resemble` namespace — happy to adjust the ID / naming to fit your conventions before merge.
  • Resemble AI will co-announce the plugin on launch and add Portkey to our integrations documentation.

Adds a new guardrail plugin that scans audio, video, and image URLs
referenced in LLM requests for deepfake / synthetic content via the
Resemble AI Detect API (https://app.resemble.ai/api/v2/detect).

The handler supports:
- beforeRequestHook and afterRequestHook
- URL extraction from multimodal content parts (OpenAI input_audio /
  image_url, Anthropic source.url), regex over plain text, or
  context.metadata fallback (selectable via urlSource)
- Optional audio source tracing (identifies TTS vendor), reverse image
  search, and Zero Retention Mode
- Configurable threshold, polling interval / timeout, and fail-open vs
  fail-closed behaviour on API errors

23 jest tests cover URL extraction, evaluation, the handler happy path,
polling, timeouts, fail modes, and base-URL override.
@0xbrainkid
Copy link
Copy Markdown

Audio deepfake detection is an important guardrail layer as voice-enabled agents become more common — a voice agent that cannot verify whether the audio it received is authentic is vulnerable to real-time voice cloning attacks.

One dimension Resemble AI Detect covers at the content layer that complements an agent identity layer: the detected deepfake tells you the audio is synthetic, but not which agent sent it. For multi-agent voice pipelines (e.g., one agent calling another via voice channel), a deepfake detection positive identifies the attack but not the attacker — attribution requires agent identity at the call layer alongside content-level detection.

A combined trust check for voice-enabled agent endpoints:

const result = await resembleDetect.scan(audioBuffer);
if (result.deepfake_score > THRESHOLD) {
    emit_security_event("audio_deepfake_detected", {
        deepfake_score: result.deepfake_score,
        caller_agent_id: request.headers["X-Agent-ID"] || "unknown",
        caller_trust_score: await satp.getScore(caller_agent_id)  // was this caller already suspect?
    });
    return deny();
}

The caller_trust_score at detection time matters: a deepfake attempt from an agent with a history of trust violations warrants immediate revocation, not just the current request denial. An agent with a clean history might be compromised rather than malicious, warranting a different response.

Happy to contribute an implementation that includes caller identity context alongside the deepfake score.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants