Summary
For voice-enabled multi-agent systems, agents need a reliable STT backend. FunASR (16K+ stars) now offers an OpenAI-compatible transcription API that AutoGen agents can use directly.
Proposed Integration
FunASR provides examples/openai_api/server.py — a FastAPI server that exposes /v1/audio/transcriptions (same format as OpenAI). Any AutoGen agent using OpenAI audio can switch to FunASR by changing the base_url:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
# Use in agent tool
def transcribe_audio(file_path: str) -> str:
result = client.audio.transcriptions.create(
model="sensevoice", file=open(file_path, "rb")
)
return result.text
Advantages over OpenAI Whisper API
|
FunASR |
OpenAI Whisper API |
| Speed |
170x realtime |
~1x realtime |
| Cost |
Free (self-hosted) |
$0.006/min |
| Privacy |
Local, no data leaves |
Cloud |
| Speaker ID |
Built-in |
No |
| Languages |
50+ |
57 |
Setup
pip install funasr fastapi uvicorn python-multipart
python examples/openai_api/server.py --device cuda
Repo: https://github.com/modelscope/FunASR
Summary
For voice-enabled multi-agent systems, agents need a reliable STT backend. FunASR (16K+ stars) now offers an OpenAI-compatible transcription API that AutoGen agents can use directly.
Proposed Integration
FunASR provides
examples/openai_api/server.py— a FastAPI server that exposes/v1/audio/transcriptions(same format as OpenAI). Any AutoGen agent using OpenAI audio can switch to FunASR by changing the base_url:Advantages over OpenAI Whisper API
Setup
Repo: https://github.com/modelscope/FunASR