Skip to content

[Feature] FunASR as self-hosted speech-to-text tool for voice agents #7742

@LauraGPT

Description

@LauraGPT

Summary

For voice-enabled multi-agent systems, agents need a reliable STT backend. FunASR (16K+ stars) now offers an OpenAI-compatible transcription API that AutoGen agents can use directly.

Proposed Integration

FunASR provides examples/openai_api/server.py — a FastAPI server that exposes /v1/audio/transcriptions (same format as OpenAI). Any AutoGen agent using OpenAI audio can switch to FunASR by changing the base_url:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

# Use in agent tool
def transcribe_audio(file_path: str) -> str:
    result = client.audio.transcriptions.create(
        model="sensevoice", file=open(file_path, "rb")
    )
    return result.text

Advantages over OpenAI Whisper API

FunASR OpenAI Whisper API
Speed 170x realtime ~1x realtime
Cost Free (self-hosted) $0.006/min
Privacy Local, no data leaves Cloud
Speaker ID Built-in No
Languages 50+ 57

Setup

pip install funasr fastapi uvicorn python-multipart
python examples/openai_api/server.py --device cuda

Repo: https://github.com/modelscope/FunASR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions